211703 Variable selection in analysis of high-throughput data

Tuesday, November 10, 2009: 5:30 PM

JIanqing Fan , Princeton University, Princeton, NJ
Variable selection in high-dimensional space characterizes many contemporary problems in scientific discovery and decision making. Many frequently-used techniques are based on independence screening; examples include correlation ranking or feature selection using a two-sample t-test in high-dimensional classification. Within the context of the linear model, Fan and Lv (2008) showed that this simple correlation ranking possesses a sure independence screening property under certain conditions and that its revision, called iteratively sure independent screening(ISIS), is needed when the features are marginally unrelated but jointly related to the response variable. In this paper, we extend ISIS, without explicit definition of residuals, to a general pseudo-likelihood framework, which includes generalized linear models as a special case. Even in the least-squares setting, the new method improves ISIS by allowing variable deletion in the iterative process. Our technique allows us to select important features in high-dimensional classification where the popularly used two-sample t-method fails. A new technique is introduced to reduce the false discovery rate in the feature screening stage. Several simulated and two real data examples are presented to illustrate the methodology.

Learning Objectives:
To describe variable selection methods in hign dimensional data analysis

Presenting author's disclosure statement:

Qualified on the content I am responsible for because: Fan is a Professor of Statistics at Princeton University, and fellow of American Association for Advancement of Science and fellow of American Statistical Association. He has been working in the area of "Variable selection in analysis of high-throughput data" for over 8 years.
Any relevant financial relationships? No

I agree to comply with the American Public Health Association Conflict of Interest and Commercial Support Guidelines, and to disclose to the participants any off-label or experimental uses of a commercial product or service discussed in my presentation.