Reliability of a rating scale to measure severity of adverse healthcare events

Anatchkova, Milena

289070
Reliability of a rating scale to measure severity of adverse healthcare events

Monday, November 4, 2013 : 2:30 p.m. - 2:45 p.m.

Milena Anatchkova, PhD, Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, MA

Polina Harik, PhD, National Board of Medical Examiners, Philadelphia, PA

Kathleen M. Mazor, EdD, Meyers Primary Care Institute, University of Massachusetts Medical School, Worcester, MA

Joann Wagner, MSW, Meyers Primary Care Institute, University of Massachusetts Medical School, Worcester, MA

Deborah Perfetto, PharmD, Agency for Healthcare Research and Quality, Rockville, MD

Colleen Biggins, BA, Meyers Primary Care Institute, University of Massachusetts Medical School, Worcester, MA

Cassandra L. Firneno, BA, Meyers Primary Care Institute, University of Massachusetts Medical School, Worcester, MA

Kathleen Walsh, MD, MSc, Meyers Primary Care Institute, University of Massachusetts Medical School, Worcester, MA

Robert Klugman, MD, Meyers Primary Care Institute, University of Massachusetts Medical School, Worcester, MA

Jennifer Tjia, MD, MSCE, Meyers Primary Care Institute, University of Massachusetts Medical School, Worcester, MA

Objective: To describe the inter-rater reliability (IRR) of the Agency for Healthcare Research and Quality (AHRQ) Harm Scale for measuring severity and duration of adverse events in healthcare, and to describe IRR variation by clinical case type, severity of harm (death, severe, moderate, mild, no harm), and rater specialty and level of experience. A secondary aim is to identify the contribution of case type, harm severity, and rater specialty and experience to measurement error.

Methods: We conducted a reliability assessment using 50 adverse event case descriptions for 8 different Common Format case types (medication, perinatal, blood product, device, fall, healthcare-associated infection [HAI], pressure ulcer, surgical) used by the federally-mandated National Patient Safety Database. Nine clinicians representing 3 clinical specialties (physicians, nurses, and pharmacists) with 3 levels of adverse event evaluation experience (expert, moderate, novice) rated the same 400 cases after a standardized training session on the application of the Harm Scale. IRR was evaluated with free marginal multirater kappa. Generalizability analysis was used to identify sources of error.

Results: Overall, the IRR across all case types and raters was moderate (Kfmm=0.51), but differed by case type and rater specialty. For harm severity, performance ranged from fair (medications and blood transfusion) to good (HAI); for duration of harm, performance ranged from moderate (medications) to good (HAI). Intra-disciplinary agreement ranged from fair to good across all 8 case types for both physicians and pharmacists, but is better (moderate to good) among nurses. Reliability analysis showed that higher level of rating experience does not consistently produce higher reliabilities.

Generalizability analysis revealed that most of the variance (commonly referred to as measurement error') in the mean case rating is due to true' case differences in harm severity. Much smaller sources of variance included rater stringency and the interaction between case severity and rater stringency, resulting in high reliability estimates. Pharmacists were slightly more consistent in their ratings than either physicians or nurses, and expert raters were slightly more consistent then less experienced raters. Raters agreed most with each other when harm severity was death' and least when harm severity was mild harm'. On average, projected reliability for the Harm Scale reached 0.85 when the number of raters exceeded two.

Conclusions The IRR for the AHRQ Harm Scale is moderate and most of the variance is due to case type, and to a much lesser extent rater stringency, specialty, or level of experience.

Learning Areas:

Conduct evaluation related to programs, research, and other areas of practice
Public health or related laws, regulations, standards, or guidelines

Learning Objectives:
Describe the interrater reliability of the Agency for Healthcare Research and Quality (AHRQ) Harm Scale for measuring severity and duration of adverse events. Discuss alternative approaches of measuring interrater reliability.

Presenting author's disclosure statement:

Qualified on the content I am responsible for because: As a trained psychologist/psychometrician I am well versed in the development, evaluation and application of patient and clinician reported measures.
Any relevant financial relationships? No

I agree to comply with the American Public Health Association Conflict of Interest and Commercial Support Guidelines, and to disclose to the participants any off-label or experimental uses of a commercial product or service discussed in my presentation.

Back to: 3368.0: Quality of care

141st APHA Annual Meeting and Exposition

Online Program

289070
Reliability of a rating scale to measure severity of adverse healthcare events

141st APHA Annual Meeting and Exposition

Online Program

289070 Reliability of a rating scale to measure severity of adverse healthcare events

289070
Reliability of a rating scale to measure severity of adverse healthcare events