Online Program

Effectiveness of Various Variable Importance Methods in A Case Study of Highly Correlated Predictors

Tuesday, November 3, 2015

Jenhao (Jacob) Cheng, PhD, MS, Decision Analytic & Research, Press Ganey Associates, Inc., Elkridge, MD
Weihan Chen, PRESSGANEY INC, Wakefield, MA
Alice Li, MS, Decision Analytic & Research, Press Ganey Associates, Inc., Chicago, IL
Variable importance is a common decision-making problem to public health (healthcare) to determine the key risk factors (quality measures) that contribute more to the outcome. However, it can be a very challenging statistical problem when predictors are strongly correlated and no clear clue to remove the redundancy since it’s hard to isolate the unique contribution.

Traditional methods such as correlation and OLS regression are probably two most popular methods used by public health professionals but they both have the inherent limitations. Correlation only measures the marginal effect between a predictor and the outcome without considering the other predictors. OLS takes multiple predictors into consideration but the estimates (standard beta or partial correlation) become unstable when multicollinearity is present.

Three advanced methods that are developed more recently to address this inter-correlation issue are: relative weight analysis (RW), variable importance in projection (VIP), and correlation-adjusted regression (CAR).  RW and VIP share similar framework where the importance is measured on the transformed and independent principal components and then transformed back through the loading matrix. CAR adjusts the standard beta with the correlation structure without transformation.

Although these methods are theoretically more advanced their effectiveness was rarely studied before. In this small case study with a clear-cut correlation structure the true importance is possibly available as the benchmark through a sensible dimension reduction regression and assumption of similar marginal effect within a reduced dimension. RW is the only method completely consistent with the benchmark rank while PLS similar to OLS and CAR to correlation.

Learning Areas:

Biostatistics, economics
Public health or related public policy
Public health or related research
Social and behavioral sciences

Learning Objectives:
Explain the limitations of traditional statistical methods to determine the variable importance when some predictors are strongly correlated Demonstrate several advanced statistical methods available recently to address the variable importance problem when significant inter-correlation is present Evaluate the effectiveness of each method using a small case study with a well-known correlation structure available to provide the benchmark

Keyword(s): Statistics, Decision-Making

Presenting author's disclosure statement:

Qualified on the content I am responsible for because: I am a certified professional statistician and have been working in the public health analytics area more than 10 years. It's my streangth to apply advanced statistical methods to public health and healthcare quality problems in order to dig out interesting policy or decision-making information. I have presented research findings to several famous professional conferences such as APHA, AcademyHealth and Joint Statistical Meetings for multiple years, either oral or poster.
Any relevant financial relationships? No

I agree to comply with the American Public Health Association Conflict of Interest and Commercial Support Guidelines, and to disclose to the participants any off-label or experimental uses of a commercial product or service discussed in my presentation.