APHA
Back to Annual Meeting Page
 
American Public Health Association
133rd Annual Meeting & Exposition
December 10-14, 2005
Philadelphia, PA
APHA 2005
 
4242.0: Tuesday, December 13, 2005 - 3:10 PM

Abstract #114343

Improving estimates of SF-36 profiles and summary measures for use in health services research

John Ware, PhD, Mark R. Kosinski, MA, Jakob B. Bjorner, MD PhD, and Xiaowu Sun, PhD. QualityMetric Incorporated, 640 George Washington Highway, Lincoln, RI 02865, (401) 334-8800, ext. 226, jware@qualitymetric.com

Background: The “evidence” in evidenced-based health care is increasingly based on information about self-reported health status and the standardization of measures in population surveys, clinical trials and individual patient-level assessments is greatly facilitating the comparison of results from health services research. To achieve measures that better meet the requirements of each research application (e.g. population survey versus individual patient-level assessment), we developed a “family” of static short-form (SF) surveys (SF-36, SF-12 and SF-8) and dynamic assessments using item response theory (IRT) and computerized adaptive testing (CAT) software all standardized to estimate familiar SF-36 profiles and summary physical and mental health measures. While having the same mean and variance for each health domain, each survey method varies in terms of the precision and range of measurement from the most coarse SF-8 single-item estimates suitable for large population surveys to the most precise CAT-based estimates required for studies of individual patients. Methods: A series of general population surveys (N=12,050) were conducted to evaluate and compare each survey method for each of the eight SF-36 health domains. The item banks included the SF-36, SF-12, and SF-8 Health Surveys, as well as items from 52 other surveys measuring the same health domains. IRT methods were used to cross-calibrate and score the items in each item bank on a common scale and t-score transformations standardized the scores to have the same means and variances in the general population (norm-based scoring). The comparability of scores from the SF-8, SF-12, and SF-36 and CAT administration of items was assessed and respondent burden was compared. Results: For each domain, scale scores from the SF-8, SF-12, SF-36 and CAT were highly correlated (r>0.75) with one another and mean scores never differed by more than 1-2 points in the general population or across self-reported condition groups. Score estimates mainly differed in terms of their precision, with CAT scores being the most precise and SF-8 scores being the least precise. Precise scores from CAT assessments were achieved with a 70% to 80% reduction in respondent burden. Conclusion: IRT-based cross-calibration and SF-36 norm-based scoring methods enable the comparison of scores across the SF-8, SF-12 and SF-36 and CAT surveys. We discuss implications for health service research including large population surveys, clinical trials and applications requiring individual patient–level assessments in clinical practice.

Learning Objectives:

Keywords: Health Assessment, Outcome Measures

Presenting author's disclosure statement:

I wish to disclose that I have NO financial interests or other relationship with the manufactures of commercial products, suppliers of commercial services or commercial supporters.

[ Recorded presentation ] Recorded presentation

Health Services Research Contributed Papers #3

The 133rd Annual Meeting & Exposition (December 10-14, 2005) of APHA