In this Section |
202675 A Multi-layered Approach to Race Imputation using Medical Claims, Name Analyses, Inferred-Familial Connections and a Geographic Information SystemMonday, November 9, 2009: 12:45 PM
Objective. To develop an algorithm to estimate race for members where race is unknown using a combination of medical claims, name analyses, inferred-familial connections and geographic imputation from census data
Data Sources. Internally maintained employee database (n=5,439) and Medicaid members with known single race enrolled as of September 2008 (n=348,994) in a large southeastern managed care organization Study Design. A convenience sample approximating an 85/15 split of Medicaid members enrolled as of September 2008 was created for modeling purposes, where 300,000 members were used for development and a hold-out (test) sample (n=48,994) was created through simple random sampling. The 4-level imputation algorithm was created using 1) race-name associations from the Medicaid development dataset, U.S. Census Bureau and web resources; 2) geo-imputation based on census block population information; 3) race-biased medical claims diagnoses (e.g. Sickle-Cell Disease for African-Americans) and 4) following each imputation stage, members not assigned a race were assigned the race of a member in the same household if one was known (i.e. inferred-familial connection). To determine level of accuracy, imputed race was compared to a pooled response (n=54,433) of member's known race within the test dataset and the internally maintained employee database. We tested for age and gender effects on false-positive race predictions using a backwards-elimination algorithm within a logistic regression model. Principal Findings. We assigned race to 51,184 (94.0%) of the pooled datasets with an overall accuracy of 85.8%. Accuracy per race was: Asian=43.4%, African-American=70.0%, Hispanic=83.9%, Native American=7.6%, White=90.4%. Medical claims diagnoses and inferred-familial connection methodology imputed approximately 1% and 5%, respectively, of members that would have been missing otherwise. Age (P=0.001; OR 1.002, 95% CI 1.001-1.003) and gender (P=0.047; OR 0.955, 95% CI 0.913-0.999) were significant in the model, where older members' and male members' race values are marginally more likely to be incorrectly predicted. Conclusions. We were successful in developing a multi-layered approach to race imputation. Utilizing healthcare claims data and familial inference methodology was a valuable addition to traditional methods that include only surname analysis and geo-imputation. Race information is commonly unavailable to commercial health insurance carriers because collection of this data may infer misuse within premium calculations and other discriminatory concerns. However, racial disparities exist relative to the quality and quantity of health care received by minority groups. Imputing race can help plans undertake issues of racial disparity, address specific risk factors associated with race and mitigate them through proactive care management strategies.
Learning Objectives: Keywords: Ethnicity, Managed Care
Presenting author's disclosure statement:
Qualified on the content I am responsible for because: Researcher in related field since 1996, multiple submissions and presentations of projects to conferences, received Lundy award for excellence in related work in 2008 I agree to comply with the American Public Health Association Conflict of Interest and Commercial Support Guidelines, and to disclose to the participants any off-label or experimental uses of a commercial product or service discussed in my presentation.
See more of: Ethnic & Racial Disparities: Patient Perceptions, Methods, Policies, and Best Practices
See more of: Medical Care |