Mortality Rates for Areas Smaller than US Counties: Predictions based on Machine Learning
Methods: We used publicly available county-level data with national coverage from CDC, US Census Bureau, USEPA, USDA, USDOT, FBI, USDHHS, and CMS to build a predictive model of 2010 mortality rates, stratified by age and sex. The dataset included 109,514 observations and 214 predictors. We assumed Poisson distribution for death counts and used Gradient Boosted Regression Trees (GBRT, a machine learning method) for fitting. The model was evaluated against traditional regression models for count data. The model was validated on two types of data not used for estimation: race/ethnicity-specific death statistics and zip-code level deaths rates for two urban areas.
Results: The GBRT-based model significantly outperformed traditional regression models in terms of fit and validation performance. The most important predictors of mortality were traffic volume and proximity, age, state of residence; others with some influence were cost of care, tobacco use, rurality, air pollution, social support, nativity, commute, morbidity, and income.
Conclusions: Public health data can be successfully mined for patterns using machine learning methods. The models built using these methods can be applied to predict in mortality rates with high spatial resolution or for special populations.
Learning Areas:Biostatistics, economics
Public health or related research
Evaluate advantages of using machine learning methods, versus traditional regression models, in analyzing public health statistics Identify the most influential predictors of county-level patterns in the US mortality rates based on a large collection of publicly available data Demonstrate the feasibility of predicting US mortality rates with high spatial resolution and/or for special populations
Keyword(s): Mortality, Statistics
Qualified on the content I am responsible for because: I have designed and implemented the analyses that supported the content.
Any relevant financial relationships? No
I agree to comply with the American Public Health Association Conflict of Interest and Commercial Support Guidelines, and to disclose to the participants any off-label or experimental uses of a commercial product or service discussed in my presentation.