Abstract
Using machine learning as a heuristic tool to identify predictors of tuberculosis: A cross-sectional study from Pakistan
APHA 2025 Annual Meeting and Expo
Objective: To develop and evaluate machine learning-based models for predicting aTB using demographic, symptom, blood marker and imaging data.
Methods: This cross-sectional study analyzed 795 symptomatic participants in Pakistan, incorporating demographics, symptoms, provider-interpreted chest X-ray (CXR) findings, blood markers, and an AI-generated CXR abnormality score. Participants were classified as aTB or TB-negative based on culture, GeneXpert, or physician diagnosis. Using XGBoost, we developed five predictive models, sequentially integrating variable categories and selecting the 15 most predictive features per model. Model performance was assessed via area under the curve (AUC), sensitivity, specificity, positive predictive value, and negative predictive value. SHapley Additive exPlanations (SHAP) determined feature ranking, magnitude, and direction.
Results: Overall, 22% were diagnosed with aTB. Standalone CXR interpretation had 90% sensitivity, 53% specificity, compared to 90% sensitivity, 67% specificity for AI-based scoring (threshold=0.4). The top model (AUC=0.917) identified key predictors: high AI abnormality score, TB history, positive QuantiFERON, low lymphocyte frequency, and high neutrophil frequency. Decision trees prioritized a high abnormality score threshold (0.76) for sensitivity-specificity balance.
Conclusion: Machine learning is a useful heuristic tool for understanding the relative importance of TB-related variables. Future research should explore how best to integrate AI-based and traditional diagnostic approaches to optimize TB detection across diverse clinical settings.
Basic medical science applied in public health Epidemiology Public health biology