266745 Tracking health disparities through natural language processing

Tuesday, October 30, 2012

Mark Wieland, MD, MPH , Division of Primary Care Internal Medicine, Mayo Clinic, Rochester, MN
Stephen Wu, PhD , Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN
Jay Doughty, MS , Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN
Vinod Kaggal, MS , Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN
Barbara Yawn, MD, MSc , Office of Research, Olmsted Medical Center, Rochester, MN
Background: Closing the gap of racial and ethnic health disparities is a national priority. Disparities and solutions are heterogeneous within and among racial and ethnic groups, yet it is difficult to assess the need for (and impact of) interventions in subset groups at scale because existing healthcare infrastructure lacks the granularity to reflect important sociocultural distinctions. For example, specific immigrant populations (e.g., Somali) are frequently lumped into categories (e.g., African American) that preclude meaningful assessment and monitoring of community or practice-based interventions. Therefore, we applied Natural Language Processing (NLP), a discipline that allows computers to process and understand human languages, to Electronic Medical Records (EMRs) as a tool to potentially bypass these limitations.

Methods: The study was conducted at a large academic medical center in the Midwestern US with a large regional Somali population. We utilized a rule-based NLP software tool that searches clinical text in EMRs to produce customized cohorts to develop an algorithm that will identify Somali patients. Manual chart review was used as the “gold standard” to identify Somali patients, and comparison of the NLP algorithm with this standard was achieved through calculations of sensitivity and specificity.

Results: On a set of 5,782 patients who were seen by primary care physicians within a 15-day interval, our algorithm detected 122 Somali patients. A manual check of the full set of patients showed that the algorithm had excellent sensitivity (92.2%), specificity (99.9%), positive predictive value (97.5%), and negative predictive value (99.8%) for identification of Somali patients.

Conclusions: In this single-center demonstration project, an NLP algorithm showed accuracy and precision in identifying patients from a subset immigrant group. This technology holds promise to identify and track immigrants and refugees in the United States at a local healthcare level, paving the way for improved patient care and reduction of health disparities.

Learning Areas:
Communication and informatics
Diversity and culture
Public health or related research

Learning Objectives:
1) Describe the use and evaluation of a natural language processing algorithm to identify patients from a subset immigrant/refugee group. 2) Describe the opportunities for natural language processing to track health disparities among subset populations at a local level.

Keywords: Health Disparities, Information Technology

Presenting author's disclosure statement:

Qualified on the content I am responsible for because: I am a primary care physician and public health researcher with clinical and research experience among immigrant and refugee populations.
Any relevant financial relationships? No

I agree to comply with the American Public Health Association Conflict of Interest and Commercial Support Guidelines, and to disclose to the participants any off-label or experimental uses of a commercial product or service discussed in my presentation.