Use of generalized R-squared in Cox regression

Measures of explained variation, such as the coefficient of determination (R2) in linear models, are helpful in assessing the explanatory power of a model. In survival analysis, these measures help quantify the ability of prognostic factors to predict a patient's time until death. As in linear models, covariates in Cox regression may be statistically significant but still have very little predictive power. In the censored data setting, the definition of such a measure is not straightforward; several measures of explained variation have been proposed. The most popular of these is the generalized R-squared, calculated as 1-exp((χLR2)/n), where (χLR2) is the chi-square statistic for the likelihood ratio test for the overall model, and n is the total number of patients. Although the generalized R-squared is commonly recommended for the Cox model, its sensitivity to the proportion of censored values is not often mentioned. In fact, the expected value of R-squared decreases substantially as a function of the percent censored, with early censoring having a greater impact than later censoring. Simulations show that complete data R-squared values from the Cox model are very close to those from a similar linear model. However, average R-squared values can decrease by 20% or more (e.g., R-squared from 0.5 to 0.4) with heavy censoring (e.g., 50% censoring) compared to complete data. Simulation results will be presented, and alternatives to the generalized R-squared will be discussed.

