142nd APHA Annual Meeting and Exposition

Annual Meeting Recordings are now available for purchase

304681
Improving Prediction Accuracy by Controlling of Unobserved Variables via Incorporating Weakly Correlated Variables Using Unsupervised Random Forest

142nd APHA Annual Meeting and Exposition (November 15 - November 19, 2014): http://www.apha.org/events-and-meetings/annual
Tuesday, November 18, 2014 : 4:30 PM - 4:50 PM

Tao Yang , Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, LA
Title: Improving Prediction Accuracy by Controlling of Unobserved Variables via Incorporating Weakly Correlated Variables Using Unsupervised Random Forest

Authors: Tao Yang1*, Hong-Wen Deng1, Tianhua Niu1


1Center for Bioinformatics and Genomics, Department of Biostatistics and Bioinformatics, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA 70112, USA
*Corresponding Author

Abstract
Risk predictions are increasely being applied in clinical reasoning and decision making in modern medicine and epidemiological research. However, risk prediction is inevitably affected by a myriad of factors, which could affect the accuracy of the resulting prediction model. A particular challenge is to control for the effects of unobserved variables (UVs). To overcome this problem, we have implemented a novel machine learning technique known as unsupervised random forest (RF). Under a range of parameter settings, we performed extensive Monte Carlo simulations mimicking various scenarios of real-world data to assess whether an RF ensemble classifier that incorporates a set of weakly correlated variables (WCVs) could improve the accuracy of the accuracy of the prediction model in the presence of UVs. Our simulation results demonstrated that in almost all of simulation scenarios, accounting for effects of UVs has dramatically improved the overall prediction accuracy. Specifically, we found that the magnitude of such improvement depends on the number of WCVs, the correlations among WCVs, , and the non-linearity/non-additivity relationships between WCVs and UVs. Out study suggests that unsupervised RFs could have profound applications in constructing an accurate prediction model for handling UVs in clinical and epidemiological studies.

Learning Areas:

Basic medical science applied in public health
Biostatistics, economics
Public health or related research

Learning Objectives:
Design statistical method to improve the the risk prediction using the weakly correlated variable. Evaluate unsupervised random forest in clustering.

Keyword(s): Biostatistics, Genetics

Presenting author's disclosure statement:

Qualified on the content I am responsible for because: I have been supported by Tulane University, department of Biostat and Bioinf. I am working on develop statistical model for next generation sequence data analysis, and also working on improving the disease risk prediction using genetic data.
Any relevant financial relationships? No

I agree to comply with the American Public Health Association Conflict of Interest and Commercial Support Guidelines, and to disclose to the participants any off-label or experimental uses of a commercial product or service discussed in my presentation.