Online Program

Using predictive modeling approaches to examine the relative impact of clinical and public health factors in predicting disease

Tuesday, November 3, 2015

Jay V. Schindler, MPH PhD, Northrop Grumman Corporation, Atlanta, GA
Fred Sieling, PhD, Public Health Operating Unit, Northrop Grumman Corporation, Atlanta, GA
Background: With the growth of open data, data sharing, and data integration occurring with the concomitant advances in data mining, the opportunity to merge and integrate disparate data sources becomes more feasible and beneficial. Clinical data, aligned with community-based metrics and environmental factors, can generate a large, multidimensional data source to test hypotheses about the relative importance and impact of these different health factors.

Objective/purpose: This presentation will share the systematic approach used by the authors to prepare and align disparate data sources, generate predictive analytics models, down-select and identify the most relevant models for comparing the impact of clinical and public health factors, and determine the relative importance of various factors identified by different predictive modeling tools.

 Methods: The authors used various database, data manipulation, and statistical analysis tools to gather, process, and analyze data from open source and clinical depositories. Cross-validation and training/testing protocols helped substantiate the reliability of the outcomes for prediction.

Results: The use of various types of regression methods as well as other machine learning toolsets provided insight on which clinical measures are helpful in predicting specific disease conditions. In addition, using a model comparison approach, community health conditions and reported public health metrics were shown to provide a significant impact beyond the effects of clinical indicators.

Discussion/Conclusions: Using data mining approaches on merged datasets have helped document the value of examining more than clinical data: community and public health metrics can provide an enhanced understanding of complex, multifactorial disease processes and help guide public health policy and intervention planning.

Learning Areas:

Biostatistics, economics
Chronic disease management and prevention
Clinical medicine applied in public health
Provision of health care to the public
Public health administration or related administration
Public health or related research

Learning Objectives:
Describe how to conduct regression analyses to measure the relative impact of health and community factors. Explain the merit of using different machine learning approaches to examining clinical and community health data.

Keyword(s): Public Health Research, Community-Based Research (CBPR)

Presenting author's disclosure statement:

Qualified on the content I am responsible for because: I am a data scientist and public health researcher examining clinical and community health data associated with a variety of projects. I have taught graduate level biostatistics and research university courses for over 15 years.
Any relevant financial relationships? Yes

Name of Organization Clinical/Research Area Type of relationship
Northrop Grumman Corporation Public Health Employment (includes retainer)
Centers for Disease Control and Prevention Public Health Independent Contractor (contracted research and clinical trials)

I agree to comply with the American Public Health Association Conflict of Interest and Commercial Support Guidelines, and to disclose to the participants any off-label or experimental uses of a commercial product or service discussed in my presentation.