A multi-objective approach to data sharing with privacy constraints and preference based objectives

Public data sharing is utilized in a number of businesses to facilitate the exchange of information. Privacy constraints are usually enforced to prevent unwanted inference of information, specially when the shared data contain sensitive personal attributes. This, however, has an adverse effect on the utility of the data for statistical studies. Thus, a requirement while modifying the data is to minimize the information loss. Existing methods employ the notion of "minimal distortion" where the data is modified only to the extent necessary to satisfy the privacy constraint, thereby asserting that the information loss has been minimized. However, given the subjective nature of information loss, it is often difficult to justify this assertion. In this paper, we propose an evolutionary algorithm to explicitly minimize an achievement function given constraints on the privacy level of the transformed data. Privacy constraints specified in terms of anonymity models are modeled as additional objectives and an evolutionary multi-objective approach is proposed. We highlight the requirement to minimize any bias induced by the anonymity model and present a scalarization incorporating preferences in information loss and privacy bias as the achievement function.

[1]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[2]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[3]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[4]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[5]  Philip S. Yu,et al.  Bottom-up generalization: a data mining solution to privacy protection , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[6]  Kaisa Miettinen,et al.  Interactive reference direction approach using implicit parametrization for nonlinear multiobjective optimization , 2005 .

[7]  Kaisa Miettinen,et al.  On scalarizing functions in multiobjective optimization , 2002, OR Spectr..

[8]  Philip S. Yu,et al.  Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[9]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[10]  C. Coello,et al.  CONSTRAINT-HANDLING USING AN EVOLUTIONARY MULTIOBJECTIVE OPTIMIZATION TECHNIQUE , 2000 .

[11]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[12]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[13]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[14]  Grigorios Loukides,et al.  Capturing data usefulness and privacy protection in K-anonymisation , 2007, SAC '07.

[15]  Philippe Golle,et al.  Revisiting the uniqueness of simple demographics in the US population , 2006, WPES '06.

[16]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[17]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..