Exploring privacy versus data quality trade-offs in anonymization techniques using multi-objective optimization

Data anonymization techniques have received extensive attention in the privacy research community over the past several years. Various models of privacy preservation have been proposed: k-anonymity, l-diversity and t-closeness, to name a few. An oft-cited drawback of these models is that there is considerable loss in data quality arising from the use of generalization and suppression techniques. Optimization attempts in this context have so far focused on maximizing the data utility for a pre-specified level of privacy. To determine if better privacy levels are obtainable with the same level of data utility, majority of the existing formulations require exhaustive analysis. Further, the data publisher's perspective is often missed in the process. The publisher wishes to maintain a given level of data utility since the data utility is the revenue earner and then maximize the level of privacy within acceptable limits. In this paper, we explore this privacy versus data quality trade-off as a multi-objective optimization problem. Our goal is to provide substantial information to a data publisher about the trade-offs available between the privacy level and the information content of an anonymized data set.

[1]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[2]  Indrajit Ray,et al.  On the Optimal Selection of k in the k-Anonymity Problem , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[3]  T. Truta,et al.  EXTENDED P-SENSITIVE K-ANONYMITY , 2006 .

[4]  Qing Zhang,et al.  Aggregate Query Answering on Anonymized Tables , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[5]  Jian Pei,et al.  Utility-based anonymization using local recoding , 2006, KDD '06.

[6]  Cynthia Dwork,et al.  New Efficient Attacks on Statistical Disclosure Control Mechanisms , 2008, CRYPTO.

[7]  William E. Winkler,et al.  Using Simulated Annealing for k-anonymity , 2002 .

[8]  Philip S. Yu,et al.  Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[9]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[10]  David W. Corne,et al.  Approximating the Nondominated Front Using the Pareto Archived Evolution Strategy , 2000, Evolutionary Computation.

[11]  Philippe Golle,et al.  Revisiting the uniqueness of simple demographics in the US population , 2006, WPES '06.

[12]  Vitaly Shmatikov,et al.  The cost of privacy: destruction of data-mining utility in anonymized data publishing , 2008, KDD.

[13]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[14]  Yufei Tao,et al.  Anatomy: simple and effective privacy preservation , 2006, VLDB.

[15]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[16]  M. Hansen,et al.  Evaluating the quality of approximations to the non-dominated set , 1998 .

[17]  Philip S. Yu,et al.  Bottom-up generalization: a data mining solution to privacy protection , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[18]  Traian Marius Truta,et al.  Protection : p-Sensitive k-Anonymity Property , 2006 .

[19]  Huiqun Yu,et al.  A Complete (alpha,k)-Anonymity Model for Sensitive Values Individuation Preservation , 2008, 2008 International Symposium on Electronic Commerce and Security.

[20]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, CSUR.

[21]  Yufei Tao,et al.  Personalized privacy preservation , 2006, Privacy-Preserving Data Mining.

[22]  David E. Goldberg,et al.  A niched Pareto genetic algorithm for multiobjective optimization , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[23]  Ninghui Li,et al.  On the tradeoff between privacy and utility in data publishing , 2009, KDD.

[24]  Josep Domingo-Ferrer,et al.  Practical Data-Oriented Microaggregation for Statistical Disclosure Control , 2002, IEEE Trans. Knowl. Data Eng..

[25]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[26]  Samir Khuller,et al.  Achieving anonymity via clustering , 2006, PODS '06.

[27]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[28]  Kaisa Miettinen,et al.  On scalarizing functions in multiobjective optimization , 2002, OR Spectr..

[29]  Carlos A. Coello Coello,et al.  An updated survey of GA-based multiobjective optimization techniques , 2000, CSUR.

[30]  A. Osyczka,et al.  A new method to solve generalized multicriteria optimization problems using the simple genetic algorithm , 1995 .

[31]  R. K. Ursem Multi-objective Optimization using Evolutionary Algorithms , 2009 .

[32]  Dimitrios Gunopulos,et al.  Constraint-Based Rule Mining in Large, Dense Databases , 2004, Data Mining and Knowledge Discovery.

[33]  Raymond Chi-Wing Wong,et al.  Achieving k-Anonymity by Clustering in Attribute Hierarchical Structures , 2006, DaWaK.

[34]  E. Molho,et al.  Scalarization and Stability in Vector Optimization , 2002 .

[35]  Marco Laumanns,et al.  Combining Convergence and Diversity in Evolutionary Multiobjective Optimization , 2002, Evolutionary Computation.

[36]  Akimichi Takemura Local recoding by maximum weight matching for disclosure control of microdata sets , 1999 .

[37]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[38]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[39]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[40]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[41]  Grigorios Loukides,et al.  Capturing data usefulness and privacy protection in K-anonymisation , 2007, SAC '07.

[42]  Eva Ocelíková,et al.  Multi-criteria decision making methods , 2005 .

[43]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[44]  Elisa Bertino,et al.  Efficient k -Anonymization Using Clustering Techniques , 2007, DASFAA.

[45]  Wenliang Du,et al.  OptRR: Optimizing Randomized Response Schemes for Privacy-Preserving Data Mining , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[46]  Evangelos Triantaphyllou,et al.  Multi-Criteria Decision Making Methods , 2000 .

[47]  Clement T. Yu,et al.  Proceedings of the 2006 ACM SIGMOD international conference on Management of data , 2006, SIGMOD 2006.

[48]  Proceedings of the 24th International Conference on Data Engineering, ICDE 2008, April 7-12, 2008, Cancún, Mexico , 2008, ICDE.

[49]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[50]  Raghu Ramakrishnan,et al.  Privacy Skyline: Privacy with Multidimensional Adversarial Knowledge , 2007, VLDB.

[51]  Marco Laumanns,et al.  SPEA2: Improving the strength pareto evolutionary algorithm , 2001 .

[52]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[53]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[54]  Zude Li,et al.  Towards an Anti-inference (K, l)-Anonymity Model with Value Association Rules , 2006, DEXA.

[55]  Tamir Tassa,et al.  k-Anonymization Revisited , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[56]  Indrakshi Ray,et al.  A crossover operator for the k- anonymity problem , 2006, GECCO '06.

[58]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[59]  Raymond Chi-Wing Wong,et al.  (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing , 2006, KDD '06.

[60]  Pierangela Samarati,et al.  Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[61]  Sushil Jajodia,et al.  Secure Data Management in Decentralized Systems , 2014, Secure Data Management in Decentralized Systems.