Session

Machine Learning and HIV Treatment Cascade: Findings from the Big Data Analytics HIV/AIDS Health Utilization Project in South Carolina.

Yu-Hsiang Hsieh, PhD, Department of Emergency Medicine, Johns Hopkins University, Baltimore, MD and Michelle Odlum, EdD, MPH, Columbia University, New York, NY

APHA's 2020 VIRTUAL Annual Meeting and Expo (Oct. 24 - 28)

Abstract

Machine learning approaches to understanding predictors of comorbidity among people living with HIV in electronic health record data

Xueying Yang, PhD1, Jiajia Zhang, PhD1, Shujie Chen1, Sharon Weissman, MD2, Bankole Olatosi, PhD, MS, MPH, FACHE1 and Xiaoming Li, Ph.D.3
(1)University of South Carolina, Columbia, SC, (2)Columbia, SC, (3)University of South Carolina Arnold School of Public Health, Columbia, SC

APHA's 2020 VIRTUAL Annual Meeting and Expo (Oct. 24 - 28)

Background The knowledge gaps in understanding the predictor of comorbidity among people living with HIV (PLWH) are barriers to inform efficient HIV care management. We identified predictors of comorbidity patterns represented by Charlson Comorbidity Index (CCI) among PLWH based on two machine learning models.

Methods Extracted through electronic reporting system in Department of Health and Environmental Control in SC, the study population was PLWH diagnosed between Jan 2005 and Dec 2016 and living in South Carolina (SC). The severity of comorbidity was measured by the dichotomized age adjusted CCI score (high: >4; low: CCI ≤4) calculated from the weighted sum of the presence of 19 health conditions. Nineteen risk predictors were used to predict the severity of comorbidity based on the least absolute shrinkage and selection operator (LASSO) regression and classification and regression tree (CART) analysis, where data was split into two sets with 80% for training and 20% for validation.

Results Of 5989 patients, the median CCI score was 4 (range: 2-22). Both models demonstrated good prediction accuracy where AUC is 0.76 (95% CI: 0.75-0.77) for LASSO procedure and 0.72 (95% CI: 0.69-0.75) for CART. Top predictors in both models include older age, higher percentage of retention in care and viral suppression, MSM or IDU, and longer duration of days with low CD4.

Discussion The machine learning methods could identify the most important predictors of comorbidity among PLWH with high accuracy. Results may enhance the understanding of comorbidity and provide the data-based evidence for future care management of PLWH.

Chronic disease management and prevention Epidemiology Public health or related public policy Social and behavioral sciences

Abstract

Contextual factors with county-level retention in care status among people living with HIV in South Carolina from 2005 to 2016

Chengbo Zeng, PhD1, Jiajia Zhang, PhD2, Xiaowen Sun, M.S.2, Zhenlong Li, PhD2, Sharon Weissman, MD3, Bankole Olatosi, PhD, MS, MPH, FACHE2 and Xiaoming Li, Ph.D.1
(1)University of South Carolina Arnold School of Public Health, Columbia, SC, (2)University of South Carolina, Columbia, SC, (3)University of South Carolina School of Medicine, Columbia, SC

APHA's 2020 VIRTUAL Annual Meeting and Expo (Oct. 24 - 28)

Background: Retention in care (RIC) among people living with HIV (PLWH) is a central component of HIV care continuum, but few discussion focuses on association between area-level risk factors and RIC. This study investigated the contextual factors associated with County-Level RIC rates among PLWH in South Carolina (SC) over 2005 to 2016 through machine leaning methods. Methods: All PLWH with diagnosed HIV/AIDS between Jan 2005 and Dec 2016 and living in SC were extracted from the electronic reporting system in Department of Health and Environmental Control in SC and aggregate to define county-level RIC status (high >55.0% and low <55.0%), where 55.0% is the average federal level RIC rate from 2010 to 2016 (range: 54.7%~57.6%). A total of 15 county-level factors were extracted from American Community Survey with 5 years estimates over 2005-2016. The least absolute shrinkage and selection operator (LASSO) regression and random forest analysis were used to predict the retention in care status. Results: Eight counties (16.3%) had RIC rate lower than the average federal level with range 6.5%~86.9% during the study period. Both models demonstrated good prediction accuracy where AUC is 0.72 for the random forest and 0.65 for LASSO procedure. Important contextual predictors of county-level RIC status were poverty proportion, education levels, proportion of health insurance coverage, and unemployment rates. Conclusions: There was geographic differences of RIC among PLWH in SC varying by the county-level poverty and education level. Tailored intervention on different counties in SC would be effective in improving HIV treatment and care.

Biostatistics, economics Chronic disease management and prevention Planning of health education strategies, interventions, and programs Social and behavioral sciences

Abstract

Using big data and machine learning to predict missed opportunities for HIV diagnosis in South Carolina

Sharon Weissman, MD1, Jiajia Zhang, PhD2, Shujie Chen2, Bankole Olatosi, PhD, MS, MPH, FACHE2 and Xiaoming Li, Ph.D.3
(1)Columbia, SC, (2)University of South Carolina, Columbia, SC, (3)University of South Carolina Arnold School of Public Health, Columbia, SC

APHA's 2020 VIRTUAL Annual Meeting and Expo (Oct. 24 - 28)

Background:

Early HIV diagnosis is key to ending the HIV epidemic. Big data and machine learning can develop better predictive tools for targeted HIV testing. We developed and validated a prediction model to identify predictors of missed opportunities for HIV testing.

Methods:

The SC enhanced HIV/AIDS Reporting System and records from a statewide all payer health care (HC) database were linked. Analysis includes individuals diagnosed with HIV in SC from 01/2013-12/2016 and all HC visits from 2005 to HIV diagnosis. Late testers (LT) were defined as initial CD4 <200 cells/mm3. For LT, all HC visits within eight years before HIV diagnosis were included as missed opportunities. For non-LT, visits occurring within three years were included. We applied least absolute shrinkage and selection operator (LASSO) regression and classification and regression tree (CART) analysis to identify independent predictors of missed opportunities for HIV diagnosis.

Results:

2693 new HIV diagnosed were identified, 743 (27.6%) were LT. 1987 (73.4%) had at least one HC visit prior to their HIV diagnosis, mean number of visits was 6.2. Predictors in both models were age, gender, race/ethnicity, transmission mode, LT, rural/urban residence, year of diagnosis and sexual transmitted infection. In both the CART and LASSO procedure the most important variables were race, LT and gender with an AUC of 0.58 (95% CI: 0.53 -0.63) and 0.69 (95% CI: 0.67-0.71), respectively.


Conclusion:

Prediction models using machine learning techniques can identify predictors of “missed opportunities” for HIV diagnosis. These techniques will allow more precise targeting of HIV testing efforts.

Epidemiology Planning of health education strategies, interventions, and programs Public health or related research

Abstract

Machine learning modeling framework to predict treatment linkage to care in patients newly diagnosed with HIV in mecklenburg county, NC

Shi Chen, PhD1, Michael Dulin, M.D., Ph.D.2, Yakubu Owolabi, DVM1, Patrick Robinson, MD, MPH3, Brian Witt, PhD, MPH4 and Erika Samoff5
(1)University of North Carolina at Charlotte, Charlotte, NC, (2)University of North Carolina Charlotte, Charlotte, NC, (3)The University of North Carolina at Charlotte, Charlotte, NC, (4)Mecklenburg County Public Health, Charlotte, NC, (5)North Carolina Department of Health and Human Services, Raleigh, NC

APHA's 2020 VIRTUAL Annual Meeting and Expo (Oct. 24 - 28)

background: The national goal of ending HIV epidemic has not been achieved yet. This is partly due to delayed linkage to care (LtC). Recent advancements in machine Learning (ML) has the potential to help researchers better understand and close the gap in HIV care delivery.

objectives: We aim to identify and quantify important risk factors associated with delayed LtC for patients newly diagnosed with HIV with novel ML models.

methods: Deidentified 2013-2017 Mecklenburg County surveillance data (eHARS) were requested. Univariate analyses were used to quantify associations between delayed LtC (i.e., LtC>30d after diagnosis) and demographic, epidemiological, geographic, and clinical factors of HIV carriers. ML models, including random forest model, were then developed and validated in R 3.5.0 to the same data to predict risk of delayed LtC of individual HIV carriers.

results: Types of HIV-diagnosing facility significantly influenced time to LtC; first diagnosis in hospital is associated with the shortest time for LtC. HIV patients with lower CD4 counts (<200 RNA copies) are twice as likely to LtC within 30d than those with higher CD4. Random forest model achieves high accuracy (>80% without CD4 data and >95% with CD4 data) to predict individual risk of LtC delay.

conclusions: This study combines advantages of interpretable hypothesis-driven and state-of-the-art ML methods to achieve a more comprehensive understanding of challenges in LtC delays. These findings provide personalized recommendations for individual patients to better understand their own care continuum. They also help public health teams identify high-risk communities across Mecklenburg County.

Assessment of individual and community needs for health education Clinical medicine applied in public health Epidemiology Protection of the public in relation to communicable diseases including prevention or control Public health or related research

Abstract

Application of machine learning techniques in classification of HIV medical care status for people living with HIV in South Carolina.

Jiajia Zhang, PhD1, Sharon Weissman, MD2, Bankole Olatosi, PhD, MS, MPH, FACHE1, Shujie Chen1, Xiaoming Li, Ph.D.3 and Xiaowen Sun, M.S.1
(1)University of South Carolina, Columbia, SC, (2)Columbia, SC, (3)University of South Carolina Arnold School of Public Health, Columbia, SC

APHA's 2020 VIRTUAL Annual Meeting and Expo (Oct. 24 - 28)

Background: Ending the HIV Epidemic requires innovative use of data for intelligent decision-making from surveillance through treatment. Using linked data available for South Carolina (SC), we set out to find hidden patterns in linked data useful for predicting future HIV care status for SC PLWH.

Methods: Linked data from the SC enhanced HIV/AIDS Reporting System and a statewide all payer database consisting of 233482 observations were split into three (training [40%]; Validation [30%]; and Test [30%]). We examined associations between the binary target “care status” (In care vs. not in care), and 43 inputs (explanatory variables). We compared multiple classification algorithms such as deep neural networks, automated neural networks, decision trees and regression. We focused on three main goals namely future case prediction, hidden input selection and complexity optimization. We compared models by examining model classification performance using standard machine learning measures and receiver operating curves (ROC).

Results: Preliminary analyses showed combination inputs most predictive of being not in care (tobacco use, heterosexual, Black and age). Conversely, inputs most predictive of being in care included age, prior year care status, schizophrenia, CD4, and transmission risk. Model performance ROC were best for neural networks, ensembles and gradient boosting. Trade-offs were required for each model with neural networks (Sensitivity: 60.7%; misclassification: 33.7%) and ensemble (Sensitivity: 60.6%; misclassification: 35.2%) the best classifiers of care status in the validation model.

Conclusion: These algorithmic applications of neural networks and other machine learning techniques holds significant promise for predicting future states of PLWH HIV care status.

Epidemiology Public health or related research