Online Program

A likelihood-based approach supporting identification of contaminated food using sales data: The effects of shopping behavior on performance

Tuesday, November 3, 2015

Kun Hu, Public Health Research, IBM Almaden Research Center, San Jose, CA
Stefan Edlund, Public Health Research, IBM Almaden Research Center, San Jose, CA
James Kaufman, PhD, Public Health Research, IBM Almaden Research Center, San Jose, CA
Annemarie Kaesbohrer, Bundesinstitute für Risikobewertung, Berlin, Germany
Matthias Filter, Bundesinstitut für Risikobewertung, Berlin, Germany
Objective: Rapid identification of contaminated food is vital to minimize illness and loss in an outbreak. A likelihood-based approach, mapping geo-coded sales data against geocoded confirmed case reports, has the potential to reduce the time required for investigation.[1] The likelihood “score” defines a binary classifier defining a set of potentially “guilty” products which contains the contaminated source with high accuracy. In this work we report the effects of consumer travel on the performance of the method.

Study Design: We use 3 years of weekly sales data for 580 anonymous food products from a German retailer covering 3,513 of Germany's 8,235 postal areas. Replacing the simplest assumption where outbreaks occur where products are sold, [1] a more accurate model accounts for the fact that consumers do not only shop where they reside. A retail “gravity” model is used to test the effects of travel behavior [2].

Conclusions: ROC curves measure “sensitivity” based on the fraction of true positives vs. fraction of false positives. The area under the ROC curve (AUC) was measured for outbreaks with varying case number and with the gravity model “travel exponent”, k, varied over the expected range for grocery shopping. The performance of the method degrades as consumer travel distance increases. However, the degradation becomes less pronounced as more case reports are included in the analysis. For k = 3, the classifier performance having observed 50 cases versus only a single case dropped 6.2 %, while for k = 1 performance dropped 36.3 %.

[1] Kaufman J, Lessler J, Harry A, Edlund S, Hu K, Douglas J, Thoens C, Appel B, Kasbohrer A, PLoS Comput Biol 10(11): 2014

[2] Huff DL (1963) A probabilistic analysis of shopping center trade areas. Land Economics 39 (1) 81–90. doi: 10.2307/3144521

Learning Areas:

Public health or related research

Learning Objectives:
Compare the impact of the shoppers travel distance with regard to combating outbreaks of food borne disease.

Keyword(s): Food Safety

Presenting author's disclosure statement:

Qualified on the content I am responsible for because: Dr. Hu is a research scientist in the Public Health research group at IBM Almaden Research Center. She is trained as a system and behavioral scientist with expertise in mathematical modeling of food safety issues. She co-led the project of building a statistical modeling framework to accelerate food-borne outbreak investigations leveraging food sale data with geographical information. She received her Ph.D. from the department of Industrial and Systems Engineering at Virginia Tech in 2011.
Any relevant financial relationships? No

I agree to comply with the American Public Health Association Conflict of Interest and Commercial Support Guidelines, and to disclose to the participants any off-label or experimental uses of a commercial product or service discussed in my presentation.