Diagnosing patterns of matches/non-matches and missing data when working with large merged databases

3231.0: Monday, November 13, 2000 - 5:00 PM

Abstract #7186

Diagnosing patterns of matches/non-matches and missing data when working with large merged databases

Peter Hovmand, MSW¹, Harry Perlstadt, PhD, MPH², Susan Grettenberger, PhD, MSW³, and Jim Kent³. (1) School of Social Work, Michigan State University, 254 Baker Hall, Michigan State University, East Lansing, MI 48825, 517-353-9999, ext. 2, hovmandp@pilot.msu.edu, (2) Department of Sociology, Michigan State University, (3) Michigan Department of Community Health

The state maintains two major statewide HIV/AIDS related databases with unduplicated records along with two additional databases that have supplemental variables for a subset of individuals. All four databases use the same unique encrypted identifier for an individual, which makes it theoretically possible to generate and conduct analyses on the merged database. Analyzing merged databases with different variables introduces two problems: (1) validity of the matches and non-matches and (2) missing data. Failure to consider the impact of matches/non-matches and missing data can have major undesirable consequences when trying to draw inferences from the merged databases. Hence, a major aspect of handling the problems associated matching methodology and missing data involves diagnosing the pattern of matches/non-matches and missing data. This can be quite difficult when the number of cases and variables is large (merged databases with more than 10,000 records and 500 variables). This paper presents a method for large database of visualizing and diagnosing patterns of matches/non-matches and missing data. Implications for assessing the impact of matching errors and data imputation are also discussed.

Learning Objectives: At the conclusion of this session, participants will be able to:

Identify two problems of handling missing data in the analysis of merged databases.
Apply a method for visualizing the pattern of missing data and matches/non-matches to large merged databases.
Select a data imputation method appropriate to each pattern of matching/non-matching and missing data.

Keywords: Information Databases, Statistics

Presenting author's disclosure statement:
Organization/institution whose products or services will be discussed: None
I do not have any significant financial interest/arrangement or affiliation with any organization/institution whose products or services are being discussed in this session.

The 128th Annual Meeting of APHA