Online Program

Open source health intelligence (OSHINT) for food borne illness event characterization

Monday, November 4, 2013

Jane Blake, M.S., Cloud Analytics, Booz Allen Hamilton, McLean, VA
Catherine Ordun, MBA, MPH, Cloud Analytics, Booz Allen Hamilton, Atlanta, GA
Nathanael Rosidi, PhD, Cloud Analytics, Booz Allen Hamilton, Rockville, MD
Vahan Grigoryan, PhD, Cloud Analytics, Booz Allen Hamilton, Rockville, MD
Frederica Conrey, PhD, Cloud Analytics, Booz Allen Hamilton, Rockville, MD
KC Decker, M.S., Cloud Analytics, Booz Allen Hamilton, Atlanta, GA
Background: Immediate, targeted response to outbreaks saves lives and protects national safety. Public health and medical response can be delayed in the early stages of emergencies by a lack of situational awareness. Many early warning biosurveillance systems rely on test results from doctor visits, which can take two weeks to reach analysts. In large-scale outbreaks, two weeks can cost thousands of lives. Web-based data, especially Twitter, could change response timescales from weeks to hours. Objective: We created and tested an Open Source Health Intelligence (OSHINT) cloud-based prototype to characterize number sick, dead, and hospitalized from foodborne illness, in near real-time. The objective is to advance research on event characterization from non-traditional sources to supply decision makers with situational awareness. Methods: We collected Twitter feeds related to Salmonella and Escherichia Coli events during 2006–2012 and used term-frequency-inverse document frequency (TF-IDF) and natural language processing (NLP) to automatically characterize number sick, dead, and hospitalized. We tested the model against known food-borne disease outbreak data validated by the U.S. Centers for Disease Control between 2010-2012. Results: We evaluated precision of the NLP algorithm (eg, did OSHINT tweets pertain to foodborne illness) and accuracy of OSHINT characterizations compared to CDC epidemiological curves and situation reports (eg, did OSHINT numbers match actual numbers). For a 2012 multistate tuna Salmonella event, manual review of tweet content indicated the NLP algorithm effectively identified tweets related to foodborne illness at 100%, 100%, and 88% respectively for sick, hospitalized and dead. Furthermore, the characterization algorithm matched the shape of CDC situation reports, but was imperfect when compared to the epidemiological curve. Conclusion: OSHINT shows value for characterization, forecasting, and detection of outbreaks. We discuss how to narrow the gap between OSHINT and CDC epidemiological curves and OSHINT's application towards natural disasters, mass casualty incidents, and acute mental health concerns.

Learning Areas:

Communication and informatics

Learning Objectives:
Describe how social media outlets, such as Twitter, can be used to automatically characterize foodborne illness events in real-time. Discuss how Open Source Health Intelligence (OSHINT) utilizes natural language processing to extract #sick, #dead, and #hospitalized related to foodborne illness. Identify ways in which OSHINT could be improved or applied to additional public health issues.

Keyword(s): Surveillance, Infectious Diseases

Presenting author's disclosure statement:

Qualified on the content I am responsible for because: I have over twelve years of experience conducting research, developing methodologies and identifying solutions for biosurveillance challenges. I currently co-lead the development of Open Source Health Intelligence at Booz Allen and was one of the principal developers of Project Global Argus, a million-dollar open source biosurveillance capability developed at Georgetown University and funded by the US Government.
Any relevant financial relationships? Yes

Name of Organization Clinical/Research Area Type of relationship
Booz Allen Hamilton Biosurveillance Employment (includes retainer)

I agree to comply with the American Public Health Association Conflict of Interest and Commercial Support Guidelines, and to disclose to the participants any off-label or experimental uses of a commercial product or service discussed in my presentation.

Back to: 3408.0: HIIT Poster Session 4