141st APHA Annual Meeting

In This section

287104
Topic discovery using discussion posts in an online cancer community

Monday, November 4, 2013

Kenneth M. Portier, Ph D Biostatistics , American Cancer Society, Atlanta, GA
Greta Greer, LCSW, MSW , Health Promotions Department, American Cancer Society, Atlanta, GA
Lior Rokach, Ph D , Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel
Nir Ofek , Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel
Yafei Wang , College of Information Sciences and Technology, Pennsylvania State University, University Park, PA
Prakhar Biyani , College of Information Sciences and Technology, University Park, PA
Siddhartha Banerjee , College of Information Sciences and Technology, College of Information Sciences and Technology, University Park, PA
Prasenjit Mitra, Ph.D , College of Information Sciences and Technology, College of Information Sciences and Technology, University Park, PA
John Yen, Ph D. Computer Science , College of Information Sciences and Technology, College of Information Sciences and Technology, University Park, PA
We examine online peer-to-peer cancer community discussion boards to learn about issues of importance to people with cancer and cancer caregivers. The ACS Cancer Survivors Network(SM)(reference CSN), launched in 2000, is the oldest and largest online peer support community for cancer survivors and caregivers with over 160,000 registered members and 85,063 discussion board posts between 2008 and 2012. Text from forum posts are processed to support topic model analysis based on the assumption that each post is associated with one or more underlying latent “topics”. A Bayesian estimation algorithm is used to discover these latent topics and assign to each post posterior probabilities of it being related to each topic. Practical issues concerning the use and calibration of topic models are discussed as well as insight gained about the optimal number of topic classes. Topic models are applied to initiating posts from the CSN breast cancer and colorectal cancer discussion forums. The two most frequent topics initiated in the breast cancer forum are “decisions after treatment” (7.7%) and “surgery/mastectomy/reconstruction decisions” (6.4%). The most frequent topics initiated in the colorectal cancer forum were “drugs used in colon cancer treatment” (6.3%) and “lung scan results” (6.4%). Changes in topics over time and the entropy of topic distributions are also discussed.

Learning Areas:
Assessment of individual and community needs for health education
Social and behavioral sciences

Learning Objectives:
Describe topic model analysis using an analysis of ACS Cancer Survivors Network discussion forum posts to illustrate the technique and explore its value in understanding cancer patient, survivor and caregiver needs.

Keywords: Peer Information Network, Assessments

Presenting author's disclosure statement:

Qualified on the content I am responsible for because: I have been working at ACS for seven years evaluating its cancer survivor service programs (including CSN) and particularly outcomes related to psycho-social paramters. I have 33 years experience as an applied statistician with interests in multivariate methods, including the text mining tools discussed here. I led the specific analysis for this research and hence understand both its statistical, computational and behavioral implications.
Any relevant financial relationships? No

I agree to comply with the American Public Health Association Conflict of Interest and Commercial Support Guidelines, and to disclose to the participants any off-label or experimental uses of a commercial product or service discussed in my presentation.

Back to: 3406.0: HIIT Poster Session 2