241101 Classification errors in regression models: A Bayesian semi-parametric approach to inference

Monday, October 31, 2011: 11:10 AM

Martijn van Hasselt, PhD , Behavioral Health Economics Program, RTI International, Research Triangle Park, NC
We consider the problem of classification errors in categorical data. For example, an intent-to-treat indicator may be a poor proxy for actual treatment received, due to noncompliance or sample attrition. A substance use indicator in self-reported survey data may be inaccurate due to poor recall or an unwillingness to tell the truth. We propose a semi-parametric Bayesian model that accounts for classification errors in a regression context. This has two important advantages compared to existing approaches. First, parametric models (Bayesian and non-Bayesian) are often based on strong assumptions. Such assumptions sometimes lack a substantive foundation and, if incorrectly imposed, can severely bias statistical inference. Our modeling approach allows a researcher to relax these assumptions, both through the prior distribution and the likelihood. For example, rather than fixing the classification error probability at some value, we can specify a distribution over reasonable values. Second, in the absence of strong assumptions, certain model parameters are typically no longer identified. It is often possible, however, to derive upper and lower parameter bounds that can be estimated from the data. The advantage of a Bayesian relative to a non-Bayesian model is that we can also make statements about the likely values of the parameters within the bounds. This has considerable practical value, since in some applications the bounds are far apart. We apply our methods in an empirical context to data from the National Survey on Drug Use and Health, and re-evaluate the relation between socioeconomic characteristics, substance use behavior and treatment outcomes.

Learning Areas:
Biostatistics, economics
Epidemiology

Learning Objectives:
Formulate a regression model that incorporates classification errors. Evaluate the impact of assumptions about the classification error on identification of the structural parameters. Compare Bayesian and non-Bayesian inference in the presence of classification errors.

Presenting author's disclosure statement:

Qualified on the content I am responsible for because: I have conducted research in econometrics for my PhD degree,and in my previous position as faculty member in the Economics Department at the University of Western Ontario.
Any relevant financial relationships? No

I agree to comply with the American Public Health Association Conflict of Interest and Commercial Support Guidelines, and to disclose to the participants any off-label or experimental uses of a commercial product or service discussed in my presentation.