The 131st Annual Meeting (November 15-19, 2003) of APHA |
Jennifer D. Parker, PhD, Population Epidemiology Branch, National Center for Health Statistics, 6525 Belcrest Road, Hyattsville, MD 20782, 301 4584419, jdp3@cdc.gov and Bonnie LaFleur, PhD, Department of Preventive Medicine, Vanderbilt University, Vanderbilt University Medical School, Nashville, TN 37232.
Background: Vital statistics are powerful tools for describing infant mortality. Approximately 4 million annual births provide power for stable estimations of rates among subgroups, particularly if several years of data are combined. However, these large datasets are cumbersome; using aggregated data can ease analysis. This study illustrates the impact of data aggregation on two measures of goodness-of-fit, deviance and dispersion (dispersion=deviance/degrees of freedom).
Methods: State and county of residence, maternal education, birthweight (50g intervals), and sex were categorized from the Perinatal Mortality file. Four nested, aggregate datasets were created. The first contained indicators for all combinations of the five categorical variables with corresponding counts of infant deaths and survivors (284,034 records). The last contained the 293 birthweight-sex combinations with counts of deaths and survivors. The remaining two datasets comprised intermediate numbers of records. Poisson regression models of mortality as a function of birthweight and sex were estimated using each aggregated dataset.
Results: Model coefficients and standard errors were identical from each dataset. Deviance statistics declined with the number of aggregated observations, from 81019 to 4915. The dispersion parameter increased with decreasing observations, from 0.28, evidence of an underdispersed model, to 16.96, evidence of an overdispersed model. No model produced a dispersion parameter near 1.0, indicative of acceptable fit.
Conclusions: These results are not surprising to analysts familiar with deviance measures; however, this illustration demonstrates the sensitivity of these measures of model fit. These results imply that overdispersion depends as much on the form of the observations as on the model.
Learning Objectives:
Keywords: Biostatistics, MCH Epidemiology
Presenting author's disclosure statement:
I do not have any significant financial interest/arrangement or affiliation with any organization/institution whose products or services are being discussed in this session.