Best practices for analyzing real-world data: Comparing methodologies for imputing ethnicity in a high volume urban clinic
Monday, November 4, 2013
: 11:09 a.m. - 11:26 a.m.
Objective: To compare methodologies for imputing ethnicity from administrative data in a high volume urban clinic. Methods: Using data from 19,165 patients charts in a retrospective study, we compared the accuracy of three methodologies for imputing ethnicity of patients: 1) surname analysis based on tabulation from the 2000 U.S. Census 2) geocoding analysis based on block coding from the 2010 U.S. Census 3) a previously published Bayesian approach involving a combination of surname and geocoding. These results were compared with the gold-standard' of patient self-reported ethnicity. Results: Overall agreement of imputed and self-reported ethnicity was fair for surname analysis (kappa=0.23), moderate for geocoding (kappa=0.58), and strong for the combined model (kappa=0.76). Surname analysis determined Asian ethnicity (sensitivity (SE) 80%; positive predictive value (PPV) 77%) and Latino ethnicity (SE 78%; PPV 68%) with reasonable accuracy but had poor reliability for Caucasians (SE 12%; PPV 92%) and African-Americans (SE 96%; PPV 47%). Geocoding determined African-American ethnicity (SE 74%; PPV 89%) and Caucasian ethnicity (SE 91%; PPV 70%) with reasonable accuracy, but had poor reliability for Asians (SE 10%; PPV 26%) and Latinos (SE 35%; PPV 41%). The Bayesian approach determined African-American (SE 84%; PPV 94%), Caucasian (SE 92%; PPV 82%), Asian (SE 83%; PPV 79%) and Latino (SE 77%; PPV 71%) ethnicity with the highest accuracy of the three methods. Conclusion: A methodology combining surname analysis and geocoding data to determine ethnicity is a valid and accurate means of imputing African-American, Caucasian, Asian and Latino ethnicity when self-reported ethnicity is not available.
Diversity and culture
Compare methodologies for imputing ethnicity from administrative data in a high volume urban clinic.
Evaluate alternative methods for collecting ethnicity data when self-reported ethnicity is unavailable.
Keyword(s): Epidemiology, Ethnicity
Presenting author's disclosure statement:
Qualified on the content I am responsible for because: Yang Dai is a biostatistician and public health researcher with significant experience and expertise in epidemiological research. He serves as the biostatistician for numerous grant in the Wills Eye Institute Department of research that range from clinical outcomes to behavioral interventions in ophthalmology.
Any relevant financial relationships? No
I agree to comply with the American Public Health Association Conflict of Interest and Commercial Support Guidelines,
and to disclose to the participants any off-label or experimental uses of a commercial product or service discussed
in my presentation.