Abstract

Using machine learning as a heuristic tool to identify predictors of tuberculosis: A cross-sectional study from Pakistan

Hannah Battey, MS¹, Richard Garfein, PhD² and Timothy Rodwell, MPH, MD²
(1)San Diego State University, San Diego, CA, (2)University of California, San Diego, San Diego, CA

APHA 2025 Annual Meeting and Expo

Background: Beyond clinical diagnostic confirmation, the patient factors most predictive of active tuberculosis (aTB) remain unclear. This study applies machine learning to determine the predictive value of variables for aTB diagnosis.

Objective: To develop and evaluate machine learning-based models for predicting aTB using demographic, symptom, blood marker and imaging data.

Methods: This cross-sectional study analyzed 795 symptomatic participants in Pakistan, incorporating demographics, symptoms, provider-interpreted chest X-ray (CXR) findings, blood markers, and an AI-generated CXR abnormality score. Participants were classified as aTB or TB-negative based on culture, GeneXpert, or physician diagnosis. Using XGBoost, we developed five predictive models, sequentially integrating variable categories and selecting the 15 most predictive features per model. Model performance was assessed via area under the curve (AUC), sensitivity, specificity, positive predictive value, and negative predictive value. SHapley Additive exPlanations (SHAP) determined feature ranking, magnitude, and direction.

Results: Overall, 22% were diagnosed with aTB. Standalone CXR interpretation had 90% sensitivity, 53% specificity, compared to 90% sensitivity, 67% specificity for AI-based scoring (threshold=0.4). The top model (AUC=0.917) identified key predictors: high AI abnormality score, TB history, positive QuantiFERON, low lymphocyte frequency, and high neutrophil frequency. Decision trees prioritized a high abnormality score threshold (0.76) for sensitivity-specificity balance.

Conclusion: Machine learning is a useful heuristic tool for understanding the relative importance of TB-related variables. Future research should explore how best to integrate AI-based and traditional diagnostic approaches to optimize TB detection across diverse clinical settings.

Basic medical science applied in public health Epidemiology Public health biology