250270 Validation of masking techniques for location privacy protection of individual-level health data

Tuesday, November 1, 2011: 1:30 PM

Paul Zandbergen, PhD , Department of Geography, University of New Mexico, Albuquerque, NM
Geographic identifiers in public health datasets present unique challenges when deciding how to release these datasets for secondary analysis. When locations of individual-level health data are released in the form of paper or digital maps, these individuals could be re-identified through reverse geocoding. Spatial datasets can therefore not be released unless the locations have been modified, for example using aggregation or geographic masking. Masking techniques apply transformations or perturbations to prevent the re-identification of individuals. Despite substantial attention by the research community in recent years, there is still limited confidence in the ability of masking techniques to reliably protect individual privacy. The current study provides an empirical examination of the trade-offs between the need for individual privacy protection and the benefits of having individual-level health data for spatial analysis. This was accomplished by validating the performance of a set of existing and newly developed masking techniques using high-resolution health datasets for study areas with varying characteristics. Spatial k-anonymity of individual-level health data was determined using an n-th nearest neighbor analysis of the masked data, providing an empirical estimate of the probability of discovery. Artificial clusters were introduced to examine the effect of masking on spatial analytic methods, including both global and local measures of spatial clustering. Results indicate that the performance of masking techniques varies with population density, type of masking, and masking parameters employed. A set of “best practices” for using geographic masking techniques is presented.

Learning Areas:
Ethics, professional and legal requirements
Public health or related laws, regulations, standards, or guidelines

Learning Objectives:
1. Participants will be able to identify potential privacy risks associated with the analysis of health datasets containing geographic identifiers 2. Participants will be able to describe strategies that can be utilized for location privacy protection 3. Participants will be able to explain strengths and weaknesses of different geographic masking techniques

Keywords: Privacy, Geographic Information Systems

Presenting author's disclosure statement:

Qualified on the content I am responsible for because: As a researcher I have worked on privacy issues related to health datasets for a number of years. I will be presenting results from two federally funded research projects.
Any relevant financial relationships? No

I agree to comply with the American Public Health Association Conflict of Interest and Commercial Support Guidelines, and to disclose to the participants any off-label or experimental uses of a commercial product or service discussed in my presentation.