The state maintains two major statewide HIV/AIDS related databases with unduplicated records along with two additional databases that have supplemental variables for a subset of individuals. All four databases use the same unique encrypted identifier for an individual, which makes it theoretically possible to generate and conduct analyses on the merged database. Analyzing merged databases with different variables introduces two problems: (1) validity of the matches and non-matches and (2) missing data. Failure to consider the impact of matches/non-matches and missing data can have major undesirable consequences when trying to draw inferences from the merged databases. Hence, a major aspect of handling the problems associated matching methodology and missing data involves diagnosing the pattern of matches/non-matches and missing data. This can be quite difficult when the number of cases and variables is large (merged databases with more than 10,000 records and 500 variables). This paper presents a method for large database of visualizing and diagnosing patterns of matches/non-matches and missing data. Implications for assessing the impact of matching errors and data imputation are also discussed.
Learning Objectives: At the conclusion of this session, participants will be able to:
Keywords: Information Databases, Statistics
Presenting author's disclosure statement:
Organization/institution whose products or services will be discussed: None
I do not have any significant financial interest/arrangement or affiliation with any organization/institution whose products or services are being discussed in this session.