Using a novel software program to improve Twitter data collection methods
Purpose: The purpose of this study is to describe the results of using a novel computer software program to improve the collection and analysis of Twitter data.
Methods: Each week over 6 months, we collected all content-relevant tweets worldwide using Personal Zombie, a Twitter data collection software program developed at Drexel University. The software gathers data generated from a series of searches that are executed at regular intervals through a cloud computing platform. We used 13 key words related to HPV and the vaccine (for example: “HPV”, “cervical cancer”, “HPV vaccine”, “#HPV”) to collect all tweets within our search criteria.
Results: Personal Zombie mined all tweets for each of the 13 search terms separately. After 6 months of data collection, we collected 511,464 tweets that contained at least one of the 13 search terms. The top three search terms by volume of tweets included HPV (282,354 tweets; 55% of all tweets), cervical cancer (101,171 tweets; 20% of all tweets), and HPV vaccine (40,629 tweets; 8% of all tweets). The fewest number of tweets were collected by #HPVshot (9 tweets). After merging and de-duping tweets collected from each of the 13 search terms, a total of 396,112 unique tweets were collected.
Conclusion: Findings suggest that using an improved software program and a comprehensive list of search terms will result in a pool of tweets approaching the true population.
Learning Areas:Communication and informatics
Public health or related research
Explain the utility of analyzing social media messages and their application to public health practice Compare the benefits and limitations of different computer software programs that are available to analyze Twitter data Describe how to select key word terms for mining Twitter data so as to maximize the results of the search
Keyword(s): Social Media, Data Collection and Surveillance
Qualified on the content I am responsible for because: I am a Co-Investigator on the research study and am intimately involved in the study design, data analysis, and interpretation of results. I wrote the abstract.
Any relevant financial relationships? No
I agree to comply with the American Public Health Association Conflict of Interest and Commercial Support Guidelines, and to disclose to the participants any off-label or experimental uses of a commercial product or service discussed in my presentation.