246279 Use of an innovative meta-data search tool improves variable discovery in large-p data sets like the Simons Simplex Collection (SSC)

Monday, October 31, 2011: 9:10 AM

Leon Rozenblit, JD, PhD , Prometheus Research, LLC, New Haven, CT
Alex Voronoy , Prometheus Research, LLC, New Haven, CT
Matthew Peddle , Prometheus Research, LLC, New Haven, CT
David Voccola , Prometheus Research, LLC, New Haven, CT
Clark Evans , Prometheus Research, LLC, New Haven, CT
Naralys Sinanis, MPH, CHES , Prometheus Research, LLC, New Haven, CT
Stephen B. Johnson, PhD , Biomedical Informatics, Columbia University, New York, NY
Background: The SSC, a large autism data set, includes nearly 6000 phenotype variables. Identifying the variables relevant to a research project can be a challenge. Recent approaches to this problem have focused on developing ontologies, a process that can take years to develop and that requires the user to invest in learning a new, often complex, categorization scheme before getting started. Learning Objectives: 1. Describe the process for developing an agile software tool that promotes variable discovery in large data sets 2. Assess the value of technological approaches in facilitating autism research and promoting data sharing 3. Discuss how researchers who work with large, complex data sets can adapt this tool. Methods: We used an agile, collaborative software development methodology, iterating over a 2-week cycle for 3 months. A method for dynamically generating a structured search index for both data and meta-data, and a configurable variable report were created.Results: Testing with pilot users suggests that Variable Search delivers intuitive and useful results with the SSC. Researchers can use the output to further explore each variable or to build complex queries that return multivariate data sets. Conclusions: Variable Search can run on top of any relational database, is accessible via the web, and anticipates future integration with ontology efforts. This system can be deployed at low cost on top of other large epidemiological data sources. Variable Search is a promising addition to the set of tools that help epidemiology researchers make sense of very large data sets.

Learning Areas:
Communication and informatics
Epidemiology

Learning Objectives:
1. Describe the process for developing an agile software tool that promotes variable discovery in large data sets 2. Assess the value of technological approaches in facilitating autism research and promoting data sharing 3. Discuss how researchers who work with data sets from underserved populations can adapt this tool.

Keywords: Information Databases, Information Technology

Presenting author's disclosure statement:

Qualified on the content I am responsible for because: lead a team that designs, delivers, and supports sophisticated informatics systems for human biomedical research.
Any relevant financial relationships? Yes

Name of Organization Clinical/Research Area Type of relationship
Prometheus Research, LLC Informatics Employment (includes retainer) and Stock Ownership

I agree to comply with the American Public Health Association Conflict of Interest and Commercial Support Guidelines, and to disclose to the participants any off-label or experimental uses of a commercial product or service discussed in my presentation.