236913 Integrating Automatic Flagging and Manual Inspection to Efficiently Identify Fraudulent Entries in Online Survey Research

Wednesday, November 2, 2011: 12:30 PM

Jeremy Grey, MA , Division of Epidemiology and Community Health, University of Minnesota, Minneapolis, MN
B. R. Simon Rosser, PhD, MPH , Division of Epidemiology and Community Health, University of Minnesota, Minneapolis, MN
Joseph A. Konstan, PhD , Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN
Alex Iantaffi, PhD , Division of Epidemiology and Community Health, University of Minnesota, Minneapolis, MN
The Sexually Explicit Media (SEM) Study aims to explore the relationship between SEM and HIV risk behavior in men who have sex with men (MSM). In 2011, male U.S. residents over the age of 18 who had had sex with a man in the previous five years were recruited for an online survey of SEM use. To help researchers identify invalid entries, Stata was used to automate a protocol for flagging potentially fraudulent responses. Entries were reviewed for duplicate IP, email, and payment addresses as well as last names. Responses to similar items at the beginning and end of the survey (e.g., age, sexual behavior, and zip code) were also compared. For visual inspection, a map was generated using plots of participants' zip codes in order to detect whether an abnormally high frequency of survey responses appeared in one geographic area. With liberal criteria, such as using Soundex to match different spellings, 240 of the 324 participants (74%) were flagged as potentially fraudulent. Using exact matches to flag duplicates, the number of possibly invalid entries fell to 134 (41%), of which 97 (72%) were due to the time taken to complete the survey. After reviewing the flagged entries manually, only three were determined to be invalid: one for having a non-U.S. PayPal address and two for having duplicate IP addresses. Practical guidelines for the combined use of Stata for automated fraud checking and manual examination of survey responses to efficiently improve validity in Internet-based LGBT health surveys will be discussed.

Learning Areas:
Public health or related research

Learning Objectives:
Analyze survey entries for duplicate and suspicious responses. Identify invalid entries based on protocol. Discuss criteria that more successfully predict fraud in online surveys.

Keywords: Survey, Internet

Presenting author's disclosure statement:

Qualified on the content I am responsible for because: I have assisted several NIH studies involving online survey research and have written the program to automate the de-duplication and fraud check protocol.
Any relevant financial relationships? No

I agree to comply with the American Public Health Association Conflict of Interest and Commercial Support Guidelines, and to disclose to the participants any off-label or experimental uses of a commercial product or service discussed in my presentation.