264211 Risk classification with an adaptive naive bayes kernel machine model

Tuesday, October 30, 2012 : 1:10 PM - 1:30 PM

Jessica Minnier, AM , Department of Biostatistics, Harvard School of Public Health, Boston, MA
Ming Yuan, PhD , School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, MA
Jun Liu, PhD , Department of Statistics, Harvard Univeristy, Cambridge, MA
Tianxi Cai, ScD , Biostatistics, Harvard School of Public Health, Boston, MA
Genetics play a major and complex role in many types of diseases. The complexity of the genetic architecture of human health and disease makes it difficult to identify genomic markers associated with disease risk or to construct accurate genetic risk prediction models. Accurate risk assessment is further complicated by the availability of a large number of markers that may be predominately unrelated to the outcome or may explain small amounts of genetic variation. Standard prediction models often rely on additive or marginal relationships between markers and the phenotype of interest. Marginal association based analysis has limited power to detect associations, while simple additive modeling performs poorly when underlying associations involve interactions and other nonlinear effects. Additionally, these methods do not utilize information regarding genetic pathways or gene structure. We propose a multi-stage method relating markers to disease risk through gene-sets identified from biological criteria. With a naive bayes kernel machine model, we estimate gene-set specific risk models that relate each gene-set to the outcome. Second, we aggregate across gene-sets by adaptively estimating weights for each set. The KM framework models the potentially non-linear effects of predictors without specifying a particular functional form. Estimation and predictive accuracy are improved with kernel PCA in the first stage and adaptive regularization in the second stage to remove non-informative regions from the final model. Prediction accuracy is assessed with ROC curves and AUC statistics. Numerical studies suggest that the model performs well in the presence of non-informative regions and both linear and non-linear effects.

Learning Areas:
Biostatistics, economics
Chronic disease management and prevention

Learning Objectives:
1. Formulate a model that can incorporate large numbers of complex effects of genetic markers to predict disease outcomes.

Keywords: Genetics, Biostatistics

Presenting author's disclosure statement:

Qualified on the content I am responsible for because: I have been studying this topic for my dissertation research under my advisor, Tianxi Cai, who is very knowledgable about genetic risk prediction methods in general.
Any relevant financial relationships? No

I agree to comply with the American Public Health Association Conflict of Interest and Commercial Support Guidelines, and to disclose to the participants any off-label or experimental uses of a commercial product or service discussed in my presentation.

Back to: 4248.0: Student Award Presentation