227504 Analysis of Rater Reliabilities of Incomplete Designs: A Simulation of Partially Balanced Incomplete Designs and Empirical Application

Wednesday, November 10, 2010 : 10:50 AM - 11:10 AM

Yoon Soo Park, MS , National Center for Disaster Preparedness, Mailman School of Public Health, Columbia University, New York, NY
Tasha Stehling-Ariza, MPH , National Center for Disaster Preparedness, Columbia University, New York, NY
Jonathan J. Sury, MPH, CPH , National Center for Disaster Preparedness, Columbia University, New York, NY
The use of multiple raters/experts to code subjective data has become prevalent during the past decades in public health. Examples of such research include diagnosis of disease, peer evaluation of skills, and assessment of neighborhood recovery from disasters. However, rater-specific characteristics (e.g., leniency/strictness and psychological perceptions) question the objectiveness of their assessments. Therefore, it becomes important to calculate rater reliabilities. An ideal design involved to calculate such statistic has been the implementation of a complete (i.e., fully crossed) design, where the pool of raters all code each criterion. However, the use of complete designs have challenges that include both cost and resource constraints. As such, the question of designing an effective incomplete design arises (Fleiss, 1986). This notion can be applied to a familiar concept in experimental design, where raters can be viewed as treatments and the subjects as blocks. This study simulates data to assess the effectiveness of Partially Balanced Incomplete Block Designs (PBIB), which can be easily implemented in practice, and compares this to other designs such as the Balanced Incomplete Block Design (BIB) and a completely Unbalanced Design using the Latent Class Signal Detection Theory (SDT) Model (DeCarlo, 2003). An empirical analysis of a multiple-rater coding that measures the effects of cohesion from post-Katrina neighborhoods is used to demonstrate application of this method.

Learning Areas:
Biostatistics, economics
Epidemiology
Public health or related research

Learning Objectives:
- Discuss methods to estimate rater reliability in incomplete designs - Compare various incomplete designs (balanced incomplete block designs, partial incomplete block designs, and unbalanced designs) using simulated data and measure their efficiency when estimating rater reliability - Evaluate classification accuracy given estimated rater characteristics - Demonstrate application using empirical data

Presenting author's disclosure statement:

Qualified on the content I am responsible for because: I am a Ph.D. Candidate in Measurement and Statistics at Columbia University. My dissertation deals with analyzing incomplete designs to analyze rater characteristics and classification accuracy. I am also a Data Manager/Analyst at the National Center for Disaster Preparedness that provided the empirical data for this study.
Any relevant financial relationships? No

I agree to comply with the American Public Health Association Conflict of Interest and Commercial Support Guidelines, and to disclose to the participants any off-label or experimental uses of a commercial product or service discussed in my presentation.