249420 Identifying Health-Related Topics on Twitter: An Exploration of Tobacco-Related Tweets as a Test Topic

Tuesday, November 1, 2011: 11:10 AM

Kyle Prier, BA , Department of Health, Behavior, and Society, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD
Matthew Smith, MS , Department of Computer Science, Brigham Young University, Provo, UT
Christophe Giraud-Carrier, PhD , Department of Computer Science, Brigham Young University, Provo, UT
Carl Hanson, PhD, CHES , Health Science, Brigham Young University, Provo, UT
Public health-related topics are difficult to identify in large conversational datasets like Twitter. This study examines how to model and discover public health topics and themes in tweets. Tobacco use is chosen as a test case to demonstrate the effectiveness of topic modeling via LDA across a large, representational dataset from the United States, as well as across a smaller subset that was seeded by tobacco-related queries. Topic modeling across the large dataset uncovers several public health-related topics, although tobacco is not detected by this method. However, topic modeling across the tobacco subset provides valuable insight about tobacco use in the United States. The methods used in this paper provide a possible toolset for public health researchers and practitioners to better understand public health problems through large datasets of conversational data.

Learning Areas:
Assessment of individual and community needs for health education
Communication and informatics
Planning of health education strategies, interventions, and programs
Social and behavioral sciences

Learning Objectives:
Learners will be introduced to a computational methodology to analyze large conversational datasets, specifically Twitter, and to identify public-health related topics and conversations among such data. This study evaluates the effectiveness of the topic modeling algorithm, Latent Dirichlet Allocation (LDA) to effectively assess and identify tobacco-related conversations among millions of tweets. The methodologies discussed can provide public health practitioners a possible toolset to a) identify public health problems within online communities and b) design and evaluate more effective interventions to targeted online audiences.

Keywords: Tobacco, Internet

Presenting author's disclosure statement:

Qualified on the content I am responsible for because: I have experience in data mining, social networks, and research regarding social influences of health behavior.
Any relevant financial relationships? No

I agree to comply with the American Public Health Association Conflict of Interest and Commercial Support Guidelines, and to disclose to the participants any off-label or experimental uses of a commercial product or service discussed in my presentation.