Background: Probabilistic record linkage allows injury control researchers to combine diverse databases to study the outcome of injury. While, probabilistic linkage has been used to link MVC, EMS, and hospital records applications exist for other types of injuries. Unfortunately, confidentiality concerns have made agencies reluctant to release personal identifiers that may be needed to link databases. Objective: This study investigates the effect of varying levels of identifying information on probabilistic linkage. Methods: We artificially created two databases in which the number of true matches was known. Linkages were run with varying degrees of name information (names, Soundex, initials, and no names), additional identifiers (all information; no date of birth (DOB); no DOB or location identifiers; no DOB, location or time identifiers), and error rates (0%, 1%, 5%, 10%, 25%). Sensitivity and specificity were calculated for each level of name, additional information, and error rate. Results: Best results were achieved using full names and Soundex (sensitivity and specificity > 0.95). Linkages using initials performed well (sensitivity and specificity > 0.9) until DOB, location, and time identifiers were removed. Using all available non-name identifiers, the no name linkage performed well (sensitivity and specificity above 0.96). However, with other non-name identifiers absent, the performance of the no name linkages dropped. Conclusions: While linkage was possible with no name information, results were poor with other identifiers removed. To ensure accuracy of linkages, researchers and data owners should develop methods that will guarantee adequate amounts of identifying information will be available while protecting patients’ rights.
Learning Objectives: At the conclusion of this session, the participent will be able to: 1. Describe the impact of name information on probabilistic record linkage. 2. Discuss the usefulness of and the amount of information contained in different linkage variables. 3. Understand the need for probabilistic linkage in injury control research
Keywords: Data/Surveillance, Methodology
Presenting author's disclosure statement:
Organization/institution whose products or services will be discussed: None
I do not have any significant financial interest/arrangement or affiliation with any organization/institution whose products or services are being discussed in this session.