262664 A Toolbox for Geographic Masking to Protect Confidentiality of Individual-Level Geocoded Data

Tuesday, October 30, 2012

Paul A. Zandbergen, PhD , Department of Geography, University of New Mexico, Albuquerque, NM
Kathryn E. Lenzer , Department of Geography, University of New Mexico, Albuquerque, NM
Su Zhang , Department of Geography, University of New Mexico, Albuquerque, NM
When locations of individual-level health data are released in the form of paper or digital maps, these individuals could be re-identified through reverse geocoding. Spatial datasets can therefore not be released unless the locations have been modified, for example using aggregation or geographic masking. Masking techniques apply transformations or perturbations to prevent the re-identification of individuals. As part of a larger project on spatial data confidentiality a toolbox was developed for geographic masking of individual-level datasets. Masking techniques include both existing and newly developed algorithms: 1) random direction, fixed radius; 2) random perturbation within a circle; 3) Gaussian displacement; 4) donut masking; 5) bimodal Gaussian displacement; 6) location swapping; and 7) location swapping with donut masking. The GIS-based toolbox was implemented using Modelbuilder and Python scripting. The toolbox includes detailed instructions and has been released in the public domain for use by researchers and health agencies. The performance of the various masking tools was validated using measures of spatial κ-anonymity which provides an empirical estimate of the probability of discovery. The experience of several public health agencies using the toolbox will be discussed.

Learning Areas:
Epidemiology
Ethics, professional and legal requirements
Public health or related laws, regulations, standards, or guidelines

Learning Objectives:
1. Participants will be able to identify potential confidentiality risks associated with the analysis of health datasets containing geographic identifiers 2. Participants will be able to describe strategies that can be utilized for location privacy protection 3. Participants will be able to explain strengths and weaknesses of different geographic masking techniques

Keywords: Geocoding, Privacy

Presenting author's disclosure statement:

Qualified on the content I am responsible for because: I am the PI for a NIH funded research effort on confidentiality and privacy issues associated with geocoded health datasets. The research is examining the performance of geographic masking techniques to protect location privacy of such datasets.
Any relevant financial relationships? No

I agree to comply with the American Public Health Association Conflict of Interest and Commercial Support Guidelines, and to disclose to the participants any off-label or experimental uses of a commercial product or service discussed in my presentation.