221298 Disease surveillance: The need for a robust natural language processor

Monday, November 8, 2010

Julio C. Silva, MD, MPH , Department of Information Services, Rush University Medical Center, Chicago, IL
Marilyn M. Hallock, MD, MS , Department of Emergency Medicine, Rush University Medical Center, Chicago, IL
Dino P. Rumoro, DO , Department of Emergency Medicine, Rush University Medical Center, Chicago, IL
Shital C. Shah, PhD , Department of Health Systems Management, Rush University Medical Center, Chicago, IL
Gillian S. Gibbs, MPH , Department of Emergency Medicine, Rush University Medical Center, Chicago, IL
Jamil D. Bayram, MD, MPH, EMDM , Department of Emergency Medicine, Rush University Medical Center, Chicago, IL
Michael J. Waddell, PhD , Pangaea Information Technologies, Ltd, Chicago, IL
BACKGROUND: Real-time disease surveillance is critical for rapid diagnosis and public health intervention but such a system needs to be capable of accurately processing both coded and free-text clinical information. Geographic Utilization of Artificial Intelligence in Real-Time for Disease Identification and Alert Notification (GUARDIAN) is a real-time, automated, knowledge-based, disease surveillance system. OBJECTIVE: The goal of this study is to evaluate GUARDIAN's natural language processor (NLP) algorithm. GUARDIAN's NLP is a modified version of MetaMap Transfer (MMTx), a software component developed by the National Library of Medicine. METHODS: Performance testing of the GUARDIAN NLP over an unmodified MMTx (uMMTx) compared to a gold-standard (GS) physician chart review was performed using data for 1,122 emergency department patients between November 1 - 7, 2009. Words associated with the influenza-like illness (ILI) case definition (fever, cough, and sore throat) plus the presence of fever as a vital sign, were scanned by the GUARDIAN NLP, uMMTx, and the GS with an ILI status assigned by each of the three methodologies. RESULTS: Statistics were calculated and compared the GS to GUARDIAN and uMMTx. ILI diagnosis results between GUARDIAN and (uMMTx) were as follows: PPV 97.8% (35.88%), NPV 88.5% (99.38%), sensitivity 62.6% (99.31%), specificity 99.5% (38.42%), accuracy 90% (54.10%). The average times per chart review to determine the ILI status by GUARDIAN, uMMTx, and the licensed physician were 1.9, 0.8, and 3.0 minutes, respectively. DISCUSSION/CONCLUSIONS: GUARDIAN is comparable in accuracy to a licensed physician's manual review of charts (as well as considerably more accurate than the uMMTx) with less resource utilization as measured by time spent reviewing records. With further refinements GUARDIAN will be able to more accurately translate both coded and free-text clinical information in real time.

Learning Areas:
Communication and informatics
Epidemiology
Public health administration or related administration

Learning Objectives:
1. Describe the importance of automated, real-time analysis of free-text health information for a disease surveillance system. 2. Discuss the required modifications to the standard National Library of Medicine NLP to analyze free-text and clinical notes accurately. 3. Explain GUARDIAN’s surveillance system capabilities.

Keywords: Surveillance, Information Systems

Presenting author's disclosure statement:

Qualified on the content I am responsible for because: I am the Chief Medical Information Officer and the co-principal investigator of the GUARDIAN grant project discussed in the abstract.
Any relevant financial relationships? No

I agree to comply with the American Public Health Association Conflict of Interest and Commercial Support Guidelines, and to disclose to the participants any off-label or experimental uses of a commercial product or service discussed in my presentation.