Minimax Optimal Procedures for Locally Private Estimation

ABSTRACT Working under a model of privacy in which data remain private even from the statistician, we study the tradeoff between privacy guarantees and the risk of the resulting statistical estimators. We develop private versions of classical information-theoretical bounds, in particular those due to Le Cam, Fano, and Assouad. These inequalities allow for a precise characterization of statistical rates under local privacy constraints and the development of provably (minimax) optimal estimation procedures. We provide a treatment of several canonical families of problems: mean estimation and median estimation, generalized linear models, and nonparametric density estimation. For all of these families, we provide lower and upper bounds that match up to constant factors, and exhibit new (optimal) privacy-preserving mechanisms and computationally efficient estimators that achieve the bounds. Additionally, we present a variety of experimental results for estimation problems involving sensitive data, including salaries, censored blog posts and articles, and drug abuse; these experiments demonstrate the importance of deriving optimal procedures. Supplementary materials for this article are available online.

[1]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[2]  L. Lecam Convergence of Estimates Under Dimensionality Restrictions , 1973 .

[3]  D. W. Scott On optimal and data based histograms , 1979 .

[4]  P. Assouad Deux remarques sur l'estimation , 1983 .

[5]  P. Brucker Review of recent development: An O( n) algorithm for quadratic knapsack problems , 1984 .

[6]  George T. Duncan,et al.  Disclosure-Limited Data Dissemination , 1986 .

[7]  P. Hall,et al.  Optimal Rates of Convergence for Deconvolving a Density , 1988 .

[8]  D. Lambert,et al.  The Risk of Disclosure for Microdata , 1989 .

[9]  J. Massey CAUSALITY, FEEDBACK AND DIRECTED INFORMATION , 1990 .

[10]  R. Gray Entropy and Information Theory , 1990, Springer New York.

[11]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[12]  Bin Yu Assouad, Fano, and Le Cam , 1997 .

[13]  Stephen E. Fienberg,et al.  Disclosure limitation using perturbation and related methods for categorical data , 1998 .

[14]  E. Lehmann Elements of large-sample theory , 1998 .

[15]  Yuhong Yang,et al.  Information-theoretic determination of minimax rates of convergence , 1999 .

[16]  Lianfen Qian,et al.  Nonparametric Curve Estimation: Methods, Theory, and Applications , 1999, Technometrics.

[17]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[18]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[19]  Roger Levy,et al.  Is it Harder to Parse Chinese, or the Chinese Treebank? , 2003, ACL.

[20]  Thomas M. Cover,et al.  Elements of Information Theory: Cover/Elements of Information Theory, Second Edition , 2005 .

[21]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[22]  Cynthia Dwork,et al.  Privacy, accuracy, and consistency too: a holistic solution to contingency table release , 2007, PODS.

[23]  L. Wasserman,et al.  A Statistical Framework for Differential Privacy , 2008, 0811.2501.

[24]  Adam D. Smith,et al.  Composition attacks and auxiliary information in data privacy , 2008, KDD.

[25]  A. Blum,et al.  A learning theory approach to non-interactive database privacy , 2008, STOC.

[26]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[27]  Eran Omri,et al.  Distributed Private Data Analysis: On Simultaneously Solving How and What , 2008, CRYPTO.

[28]  Martin J. Wainwright,et al.  Information-theoretic lower bounds on the oracle complexity of convex optimization , 2009, NIPS.

[29]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[30]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[31]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[32]  Cynthia Dwork,et al.  Differential privacy and robust statistics , 2009, STOC '09.

[33]  Kunal Talwar,et al.  On the geometry of differential privacy , 2009, STOC '10.

[34]  Toniann Pitassi,et al.  The Limits of Two-Party Differential Privacy , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[35]  Guy N. Rothblum,et al.  A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[36]  Stephen E. Fienberg,et al.  Differential Privacy and the Risk-Utility Tradeoff for Multi-dimensional Contingency Tables , 2010, Privacy in Statistical Databases.

[37]  Guy N. Rothblum,et al.  Boosting and Differential Privacy , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[38]  Haim H. Permuter,et al.  Interpretations of Directed Information in Portfolio Theory, Data Compression, and Hypothesis Testing , 2009, IEEE Transactions on Information Theory.

[39]  C. Sbardella High dimensional regression , 2011 .

[40]  Andrew P Morris,et al.  Basic statistical analysis in genetic case-control studies , 2011, Nature Protocols.

[41]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[42]  Po-Ling Loh,et al.  High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity , 2011, NIPS.

[43]  Adam D. Smith,et al.  Privacy-preserving statistical estimation with optimal convergence rates , 2011, STOC '11.

[44]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[45]  Sara van de Geer,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2011 .

[46]  Jing Lei,et al.  Differentially Private M-Estimators , 2011, NIPS.

[47]  Ling Huang,et al.  Learning in a Large Function Space: Privacy-Preserving Mechanisms for SVM Learning , 2009, J. Priv. Confidentiality.

[48]  Stephen E. Fienberg,et al.  Differential Privacy for Protecting Multi-dimensional Contingency Table Data: Extensions and Applications , 2012, J. Priv. Confidentiality.

[49]  Vishesh Karwa,et al.  Inference using noisy degrees: Differentially private $\beta$-model and synthetic graphs , 2012, 1205.4697.

[50]  Emmanuel J. Candès,et al.  On the Fundamental Limits of Adaptive Sensing , 2011, IEEE Transactions on Information Theory.

[51]  K. Fu,et al.  Assessing Censorship on Microblogs in China: Discriminatory Keyword Analysis and Impact Evaluation of the 'Real Name Registration' Policy , 2013 .

[52]  Larry A. Wasserman,et al.  Random Differential Privacy , 2011, J. Priv. Confidentiality.

[53]  Margaret E. Roberts,et al.  How Censorship in China Allows Government Criticism but Silences Collective Expression , 2013, American Political Science Review.

[54]  Venkat Anantharam,et al.  On Maximal Correlation, Hypercontractivity, and the Data Processing Inequality studied by Erkip and Cover , 2013, ArXiv.

[55]  Martin J. Wainwright,et al.  Local Privacy and Minimax Bounds: Sharp Rates for Probability Estimation , 2013, NIPS.

[56]  Amos Beimel,et al.  Bounds on the sample complexity for private learning and private data release , 2010, Machine Learning.

[57]  Michael Chau,et al.  Assessing Censorship on Microblogs in China: Discriminatory Keyword Analysis and the Real-Name Registration Policy , 2013, IEEE Internet Computing.

[58]  Martin J. Wainwright,et al.  Distance-based and continuum Fano inequalities with applications to statistical estimation , 2013, ArXiv.

[59]  Martin J. Wainwright,et al.  Privacy Aware Learning , 2012, JACM.

[60]  Pramod Viswanath,et al.  Extremal Mechanisms for Local Differential Privacy , 2014, J. Mach. Learn. Res..

[61]  Margaret E. Roberts,et al.  Reverse-engineering censorship in China: Randomized experimentation and participant observation , 2014, Science.