175499 Evaluating Responses to Open-Ended Questionnaire Fields Using Latent Semantic Analysis

Sunday, October 26, 2008

Travis Daniel Leleu , DoD Center for Deployment Health Research, Naval Health Research Center, San Diego, CA
Background: Qualitative data provide epidemiologists with rich information beyond that captured by quantitative data alone. The complexity of open-ended responses on questionnaires, however, often results in data that are difficult to review, collate, and interpret. Automated techniques offer one method of examining and analyzing these types of responses.

Methods: The Millennium Cohort Study is the largest prospective health study in military history. It is designed to evaluate the long-term health effects of military service, including deployments. Using Latent Semantic Analysis (LSA), sophisticated algorithms were applied to the Millennium Cohort open-ended text fields in order to classify and group responses based on semantic meaning. LSA was applied to questionnaire data from the Millennium Cohort Study collected between 2001 and 2004.

Results: More than 30,000 meaningful responses were analyzed. Demographically, open ended responders tended to be active duty, report deployment experience, and rate their general health as low. Findings show a high frequency of responses clustered around a small group of concerns. Some clusters are typical health concerns, e.g. “aches”, “dizziness”, etc., while others address military-specific health concerns or topics, such as “nuclear”, “vaccination”, and “tanks”.

Conclusions: Open-ended text questions provide information that may not be covered in structured instruments. Evaluation of open-ended responses with LSA can help identify participant concerns not directly addressed with survey questions. This may be particularly useful for longitudinal studies in which new clusters identified by LSA could be helpful in refining future survey instruments.

Learning Objectives:
1. Discuss the value of using LSA in analyzing pen-ended survey responses. 2. Describe demographic characteristics of individuals providing open-ended responses. 3. List three clusters or terms of interest identified using LSA from the Millennium Cohort Study open-ended responses.

Keywords: Information Systems, Technology

Presenting author's disclosure statement:

Qualified on the content I am responsible for because: I am the lead on this research project.
Any relevant financial relationships? No

I agree to comply with the American Public Health Association Conflict of Interest and Commercial Support Guidelines, and to disclose to the participants any off-label or experimental uses of a commercial product or service discussed in my presentation.