269242 Natural Language Processing of Portable Chest X-Ray Reports for Infection Surveillance

Monday, October 29, 2012

Dan Wang, PhD , Veterans Affairs Palo Alto Health Care System, Palo Alto, CA
Daniel Rubin, MD, MS , Department of Radiology, Stanford University, Stanford, CA
Dallas Chambers, BS , Veterans Affairs Palo Alto Health Care System, Palo Alto, CA
Justin Chambers, BS , Veterans Affairs Palo Alto Health Care System, Palo Alto, CA
Brett South, MS , IDEAS Center, Veterans Affairs Salt Lake City Health Care System, Salt Lake City, UT
Tammy Hwang, BA , Veterans Affairs Palo Alto Health Care System, Palo Alto, CA
Mary K. Goldstein, MD, MS , Geriatrics Research Education and Clinical Center (GRECC), VA Palo Alto Health Care System, Palo Alto, CA
ICU patients often have portable chest X-ray imaging taken during their stay in the hospital. Portable chest X-ray (CXR) reports routinely describe the presence, insertion, or removal of medical devices which are commonly associated with blood-borne infections. The ability to mine repositories of radiology reports in connection with clinical data could enable epidemiological research, such as correlating the frequency of infections to the length of time that certain devices are present. Our aim was to further develop a natural language processing (NLP) system to extract structured data from unstructured CXR reports for application to infection surveillance. We developed an NLP system, Chest X-Ray Device Extractor (CXDE), which analyzes reports in two steps utilizing GATE framework. The initial step separates text into individual sentences and identifies device names and words/phrases that indicate device status. The second step analyzes results of the first step to infer the status of the device identified in the report. A 500 report corpus was independently annotated by a group of human annotators, as a reference standard. CXDE was evaluated against the reference standard using precision and recall metrics, calculated as follows: recall = True Positive (TP)/(TP+False Negative (FN)) and precision = TP/(TP+False Positive FP)). The final reference set had 800 device terms. 94/800 were identified as recently inserted; 70/800 were identified as removed; and 624/800 were identified as present. After iterative development, CXDE identified device mentions with recall and precision of 95% and 98% respectively. The present status type was identified with a recall and precision of 92% and 93% respectively; for insertion status type the scores were 88% and 89%. For removal status type, the recall and precision were 90% and 94%. CXDE has the potential to enable efficient and accurate infection surveillance by automating the detection of lines/devices that are mentioned in radiology reports.

Learning Areas:
Communication and informatics
Epidemiology
Public health or related research

Learning Objectives:
Participants will be able to describe a method for developing natural language processing (NLP) from medical records; to discuss how to evaluate such systems; and to explain the potential value of such systems for automating infection surveillance.

Keywords: Technology, Epidemiology

Presenting author's disclosure statement:

Qualified on the content I am responsible for because: I have been the principal investigator of multiple federally funded grants focusing on health informatics. I am a professor of medicine at Stanford University and the Director of the Geriatrics Research Education and Clinical Center (GRECC) at the VA Palo Alto Health Care System. I direct the Primary Care Policy and Practice Advancement program at PCOR, the Stanford/VA Palo Alto Geriatric Medicine Fellowship Program, and the Special Fellowship Program in Advanced Geriatrics at VAPAHCS.
Any relevant financial relationships? No

I agree to comply with the American Public Health Association Conflict of Interest and Commercial Support Guidelines, and to disclose to the participants any off-label or experimental uses of a commercial product or service discussed in my presentation.