260825 Using multiple imputation to enhance the utility of the SEER Summary Stage

Wednesday, October 31, 2012 : 1:10 PM - 1:30 PM

Bin Huang, DrPH MS , Markey Cancer Center, University of Kentucky, Lexington, KY
Brent Shelton, PhD , Markey Cancer Center, University of Kentucky, Lexington, KY
Thomas Tucker, PhD , Markey Cancer Center, University of Kentucky, Lexington, KY
Missing data is a frequent problem in most large medical data sets. Staging of cancer is one of most important variables collected in the SEER data. Ignoring the issue of unknown stage may introduce biases to data analysis. Multiple Imputation (MI) has become an important and influential approach in the statistical analysis of missing data in recent years. This study will utilize several variations of MI via multiple statistical packages to impute SEER Summary Stage (SS) for breast and liver cancer and examine whether using MI will generate disparate results when compared to conventional methods. Performances of several software packages will be compared under the varying MI scenarios. The study population will include breast and liver cancer cases in the 1995-2008 SEER 17 registry data. Cases less than 20 years old at time of cancer diagnosis will be excluded. Three SEER SS variables (SEER SS 1977, SEER SS 2000 and CS SEER SS 2000) will be examined. Simulations will be performed for the sub-dataset which will only include the cases without missing values while assuming various missing data mechanisms. Estimates involving the SEER SS variables will be examined in the context of using them as either covariates or dependent variables in statistical modeling. The results of this study will provide further insight into whether MI has utility (compared to standard approaches) when analyzing SEER data with unknown stage as well as whether the performances of MI procedures in several statistical packages are comparable in various statistical models.

Learning Areas:
Public health or related research

Learning Objectives:
Evaluate whether using multiple imputation techniques will generate disparate results in analyzing SEER data with unknown stage when compared to use the common statistical method. Compare multiple imputation procedures implemented in several statistical software packages

Keywords: Statistics, Data/Surveillance

Presenting author's disclosure statement:

Qualified on the content I am responsible for because: I am an Assistant Professor of the Divison of Cancer Biostatistics and the Markey Cancer Center, University of Kentucky. I am also the Director of Population-based Cancer Research at Kentucky Cancer Registry. I have been the principal or co-principal of multiple federal funded grant focusing on the population-based cancer research. My primary research interests are missing data analysis and cancer disparities.
Any relevant financial relationships? No

I agree to comply with the American Public Health Association Conflict of Interest and Commercial Support Guidelines, and to disclose to the participants any off-label or experimental uses of a commercial product or service discussed in my presentation.