200887 Facilitating health data sharing: The Connecticut Health Information Network's innovative approach to data integration and data dissemination

Monday, November 9, 2009: 4:50 PM

Robert H. Aseltine, PhD , Behavioral Sciences and Community Health, University of Connecticut Health Center, Farmington, CT
Ofer Harel, PhD , Department of Statistics, University of Connecticut, Storrs, CT
Sanguthevar Rajasekaran, PhD , Department of Computer Science & Engineering, University of Connecticut, East Hartford, CT
Cal Collins , Akaza Research, Cambridge, MA
Background. The Connecticut Health Information Network (CHIN) provides an innovative technological platform for promoting data sharing among state agencies, hospitals, and healthcare providers. CHIN offers researchers, policymakers, and health practitioners the opportunity to access de-identified data from multiple sources for purposes of research, program evaluation, planning, and monitoring.

Objective. This presentation provides an overview of the CHIN infrastructure, featuring the network's innovative approaches to data integration and data dissemination.

Methods. Synthetic and real health datasets were used to assess the adequacy of two features of CHIN: (1) a canopy clustering algorithm used for probabilistic integration of multiple datasets that do not contain unique numeric identifiers; (2) a method for quantifying disclosure risk with ostensibly de-identified data, particularly in datasets that contain outliers or rare elements (e.g., rare diagnoses) that may pose threats to privacy in the absence of personal information.

Results. Results indicate that CHIN's probabilistic matching approach provides both an accurate and efficient solution for the simultaneous integration of 3 or more data sources. Further, use of auxiliary information from databases containing common misspellings and nicknames significantly improves matching accuracy. Second, tests of a probabilistic differential privacy scheme developed by the authors suggest that configurations of data that may compromise individual privacy (even when personal information has been removed) can be identified and used to mitigate the risk of disclosure.

Conclusions. The Connecticut Health Information Network provides innovative solutions to thorny problems in data integration and data dissemination that hinder efforts to promote health data sharing.

Learning Objectives:
- Describe an innovative platform for the sharing of public health data - Evaluate a unique solution to the probablistic integration of 3 or more data dissemination - Evaluate a new approach to assessing risks to individual privacy in ostensibly de-identified health data

Keywords: Health Information Systems, Public Health Research

Presenting author's disclosure statement:

Qualified on the content I am responsible for because: I am director of the Connecticut Health Information Network and am leading the research teams developing the novel solutions described in the abstract.
Any relevant financial relationships? No

I agree to comply with the American Public Health Association Conflict of Interest and Commercial Support Guidelines, and to disclose to the participants any off-label or experimental uses of a commercial product or service discussed in my presentation.