Session
Big Data and Machine Learning for Health Research
APHA 2024 Annual Meeting and Expo
Abstract
Impact of social determinants of health on diabetes outcomes: Insights from a quality improvement intervention among adult patients
APHA 2024 Annual Meeting and Expo
Methods: Adults with elevated HbA1c levels (>8.5%) were enrolled in the QI program. Participants were categorized as responders (HbA1c reduction ≥ 0.5%) or non-responders (HbA1c reduction < 0.5%) at follow-up. A total of 41 predictive features were identified from LexisNexis socioeconomic health attributes. ANOVA and Fisher's Exact tests were employed to compare responders versus non-responders. Three machine learning techniques yielded the most efficient features for inclusion in the predictive model.
Results: The study included 475 individuals from the diabetes program; (mean) age 55.5 years, pre-intervention HbA1c (9.0%), post-intervention HbA1c (8.3%), and change in HbA1c (-0.72%). Significant differences between groups were observed in terms of crime and burglary indices and address stability (p < 0.05). Final selected features included household characteristics (age and income) and neighborhood-level attributes (income, home values, and crime). Logistic regression demonstrated improved accuracy (0.55), while Support Vector Regression recorded the highest Mean Squared Error (MSE) of 0.32.
Discussion: A clinically significant decrease in HbA1c is closely associated with patients’ SDOH factors, collectively accounting for over 30% of intervention outcomes. Future interventions must include social support navigators to identify and address patients’ neighborhood stressors and socioeconomic barriers that may impact intervention outcomes.
Biostatistics, economics Chronic disease management and prevention Public health or related research
Abstract
Missing data in high dimensional multilevel data: A hierarchical machine learning approach applied to a national behavioral health study
APHA 2024 Annual Meeting and Expo
Supervised ML techniques, including tree-based and trained random forest models, are used for tackling missing data by selecting and using available data features to estimate imputed values. Cross Validation is used to simultaneously select the appropriate prediction model, train the model, and validate the accuracy of the of the predicted imputed values. An additional layer of clustering is added to these ML-based methods to obtain accurate predictions of missing values which will be useful to biomedical/healthcare researchers encountering missing data in their studies. Accounting for different characteristics of the data, this strategy accurately imputes missing data in a hierarchical setting and provides a toolkit for public health researchers who need a robust and easy to use tool for accurately predicting/imputing missing values.
Biostatistics, economics Social and behavioral sciences
Abstract
Small area estimation methods for county level obesity prevalence estimation: Generalized linear mixed models vs. machine learning techniques
APHA 2024 Annual Meeting and Expo
Small Area Estimation (SAE) is a cost-effective methodology that combines multiple data sources to enhance the survey estimator for small geographic areas or subpopulations. As machine learning (ML) continues to gain momentum in data science, more ML applications is seen in the SAE landscape. The objective of this study was to compare classic generalized linear mixed models (GLMMs) with ML algorithms for SAE of county-level obesity prevalence in Mississippi.
Methods
The 2022 Mississippi Behavioral Risk Factor Surveillance System (BRFSS) data were obtained for this study. The 2020 US census data at county level is incorporated as auxiliary data to “borrow strength.” GLMMs were constructed and validated using the survey packages in R to account for weights and YRBS complex sample design; Tree-based ML, and Neural network models are trained and validated using the IBM SPSS Modeler v18.4.0.
Results
The Mississippi county-level obesity prevalence estimated by GLMMs was validated by a three-fold cross-validation approach. An adequate range of variation among counties and satisfactory precision demonstrated by standard error are observed with the GLMM estimates. Gradient-boosted decision trees (GBDTs) generate point estimates similar to those of GLMMs, with slightly wider confidence intervals. Neural network models appear to be less accurate and less robust among the three techniques applied.
Conclusions
The classic GLMMs are among the top choices as a SAE tool. IBM SPSS Modeler makes it intuitive and easy to train and validate GBDTs, which makes it a suitable choice for less experienced ones to start SAE.
Public health or related research
Abstract
Mathematical modeling of emerging and reemerging infectious disease outbreaks to predict ED visit rate: A novel seitird model
APHA 2024 Annual Meeting and Expo
Methods: Mathematical Modeling, AI modeling, and agent-based modeling were combined to improve the accuracy of predictions and the robustness of the six compartments S-E-I-TI-R-D model. Influenza and COVID-19 datasets from the NSSP Platform (01/20/2020-02/29/2024) were used. Stability was assessed using the Routh-Hurwitz Criteria and Jacobi matrix eigenvalues analysis, while the Runge-Kutta simulation method solved ordinary differential equations.
Results: The model accurately predicted Rc and Ri at 2.31 (0.61) and 1.48 (0.23) respectively, with a combined weekly ED visit percentage of 5.95 (1.38)%. Weekly ED visits for COVID-19 peaked at 115,005 per week during the pandemic, while influenza visits were lower at 42,784 per week. However, from October 2023 to February 2024, influenza visits surpassed COVID-19 by 59.6%. On geographical analysis, the East Coast had higher ED visits for both diseases compared to the West Coast, with 12.7% of the population projected to visit the ED within 60 days after mitigation strategy relaxation.
Conclusion: To date, this is the first study that uses an integrated approach to predict the impact of the rate of spread of combined outbreaks of infectious diseases on ED visits. The study addresses the significance of these approaches in global public health and their roles in preparedness for future challenges.
Biostatistics, economics Epidemiology Protection of the public in relation to communicable diseases including prevention or control Public health or related research