k-Anonymization in the Presence of Publisher Preferences

Privacy constraints are typically enforced on shared data that contain sensitive personal attributes. However, owing to its adverse effect on the utility of the data, information loss must be minimized while sanitizing the data. Existing methods for this purpose modify the data only to the extent necessary to satisfy the privacy constraints, thereby asserting that the information loss has been minimized. However, given the subjective nature of information loss, it is often difficult to justify such an assertion. In this paper, we propose an interactive procedure to generate a data generalization scheme that optimally meets the preferences of the data publisher. A data publisher guides the sanitization process by specifying aspirations in terms of desired achievement levels in the objectives. A reference direction based methodology is used to investigate neighborhood solutions if the generated scheme is not acceptable. This approach draws its power from the constructive input received from the publisher about the suitability of a solution before finding a new one.

[1]  Philip S. Yu,et al.  Bottom-up generalization: a data mining solution to privacy protection , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[2]  Indrajit Ray,et al.  A multi-objective approach to data sharing with privacy constraints and preference based objectives , 2009, GECCO '09.

[3]  Carlos A. Coello Coello,et al.  An updated survey of GA-based multiobjective optimization techniques , 2000, CSUR.

[4]  Wenliang Du,et al.  OptRR: Optimizing Randomized Response Schemes for Privacy-Preserving Data Mining , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[5]  Qing Zhang,et al.  Aggregate Query Answering on Anonymized Tables , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[6]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[7]  Enrique Alba,et al.  Parallelism and evolutionary algorithms , 2002, IEEE Trans. Evol. Comput..

[8]  Proceedings of the 23rd International Conference on Data Engineering Workshops, ICDE 2007, 15-20 April 2007, Istanbul, Turkey , 2007, ICDE Workshops.

[9]  Kalyanmoy Deb,et al.  Interactive evolutionary multi-objective optimization and decision-making using reference direction method , 2007, GECCO '07.

[10]  Hirotaka Nakayama,et al.  Sequential Approximation Method in Multi-objective Optimization Using Aspiration Level Approach , 2007, EMO.

[11]  Indrajit Ray,et al.  On the Optimal Selection of k in the k-Anonymity Problem , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[12]  El-Ghazali Talbi,et al.  Grid computing for parallel bioinspired algorithms , 2006, J. Parallel Distributed Comput..

[13]  Berthold Vöcking,et al.  Decision-making based on approximate and smoothed Pareto curves , 2007, Theor. Comput. Sci..

[14]  Philip S. Yu,et al.  Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[15]  Yufei Tao,et al.  Personalized privacy preservation , 2006, Privacy-Preserving Data Mining.

[16]  Pierangela Samarati,et al.  Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[17]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[18]  Raymond Chi-Wing Wong,et al.  Achieving k-Anonymity by Clustering in Attribute Hierarchical Structures , 2006, DaWaK.

[19]  E. Molho,et al.  Scalarization and Stability in Vector Optimization , 2002 .

[20]  Mihalis Yannakakis,et al.  Small Approximate Pareto Sets for Biobjective Shortest Paths and Other Problems , 2009, SIAM J. Comput..

[21]  David J. DeWitt,et al.  Multidimensional K-Anonymity , 2005 .

[22]  Andrzej P. Wierzbicki,et al.  The Use of Reference Objectives in Multiobjective Optimization , 1979 .

[23]  Indrajit Ray,et al.  On the comparison of microdata disclosure control algorithms , 2009, EDBT '09.

[24]  Yufei Tao,et al.  Anatomy: simple and effective privacy preservation , 2006, VLDB.

[25]  Mihalis Yannakakis,et al.  Small Approximate Pareto Sets for Bi-objective Shortest Paths and Other Problems , 2007, APPROX-RANDOM.

[26]  Kaisa Miettinen,et al.  Interactive reference direction approach using implicit parametrization for nonlinear multiobjective optimization , 2005 .

[27]  Grigorios Loukides,et al.  Capturing data usefulness and privacy protection in K-anonymisation , 2007, SAC '07.

[28]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[29]  Sergei Vassilvitskii,et al.  Efficiently computing succinct trade-off curves , 2005, Theor. Comput. Sci..

[30]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[31]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[32]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[33]  Roger Barga,et al.  Proceedings of the 22nd International Conference on Data Engineering Workshops, ICDE 2006, 3-7 April 2006, Atlanta, GA, USA , 2006, ICDE Workshops.

[34]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[35]  Raymond Chi-Wing Wong,et al.  Minimality Attack in Privacy Preserving Data Publishing , 2007, VLDB.

[36]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[37]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[38]  Philippe Golle,et al.  Revisiting the uniqueness of simple demographics in the US population , 2006, WPES '06.

[39]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[40]  K. Miettinen,et al.  Incorporating preference information in interactive reference point methods for multiobjective optimization , 2009 .

[41]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[42]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[43]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[44]  Ninghui Li,et al.  On the tradeoff between privacy and utility in data publishing , 2009, KDD.

[45]  Kaisa Miettinen,et al.  On scalarizing functions in multiobjective optimization , 2002, OR Spectr..