Private Prediction Sets

In real-world settings involving consequential decision-making, the deployment of machine learning systems generally requires both reliable uncertainty quantification and protection of individuals’ privacy. We present a framework that treats these two desiderata jointly. Our framework is based on conformal prediction, a methodology that augments predictive models to return prediction sets that provide uncertainty quantification—they provably cover the true response with a user-specified probability, such as 90%. One might hope that when used with privately-trained models, conformal prediction would yield privacy guarantees for the resulting prediction sets; unfortunately this is not the case. To remedy this key problem, we develop a method that takes any pre-trained predictive model and outputs differentially private prediction sets. Our method follows the general approach of split conformal prediction; we use holdout data to calibrate the size of the prediction sets but preserve privacy by using a privatized quantile subroutine. This subroutine compensates for the noise introduced to preserve privacy in order to guarantee correct coverage. We evaluate the method with experiments on the CIFAR-10, ImageNet, and CoronaHack datasets.

[1]  E. Candès,et al.  The limits of distribution-free conditional predictive inference , 2019, Information and Inference: A Journal of the IMA.

[2]  Vladimir Vovk,et al.  Cross-conformal predictors , 2012, Annals of Mathematics and Artificial Intelligence.

[3]  Yue Wang,et al.  Differentially Private Confidence Intervals for Empirical Risk Minimization , 2018, J. Priv. Confidentiality.

[4]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[5]  Yin Yang,et al.  Differentially private histogram publication , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[6]  Úlfar Erlingsson,et al.  Prochlo: Strong Privacy for Analytics in the Crowd , 2017, SOSP.

[7]  Insup Lee,et al.  PAC Confidence Sets for Deep Neural Networks via Calibrated Prediction , 2020, ICLR.

[8]  Juan José del Coz,et al.  Learning Nondeterministic Classifiers , 2009, J. Mach. Learn. Res..

[9]  Vladimir Vovk,et al.  A tutorial on conformal prediction , 2007, J. Mach. Learn. Res..

[10]  Ryan J. Tibshirani,et al.  Predictive inference with the jackknife+ , 2019, The Annals of Statistics.

[11]  Vladimir Vovk,et al.  Nonparametric predictive distributions based on conformal prediction , 2017, Machine Learning.

[12]  Harris Papadopoulos,et al.  Inductive Confidence Machines for Regression , 2002, ECML.

[13]  Jing Lei,et al.  Differentially Private M-Estimators , 2011, NIPS.

[14]  Emmanuel J. Candès,et al.  Conformal Prediction Under Covariate Shift , 2019, NeurIPS.

[15]  Michael I. Jordan,et al.  Uncertainty Sets for Image Classifiers using Conformal Prediction , 2021, ICLR.

[16]  S. S. Wilks Statistical Prediction with Special Reference to the Problem of Tolerance Limits , 1942 .

[17]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[18]  Barnabás Póczos,et al.  Cautious Deep Learning , 2018, ArXiv.

[19]  Leying Guan,et al.  Prediction and outlier detection in classification problems , 2019, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[20]  Vishesh Karwa,et al.  Finite Sample Differentially Private Confidence Intervals , 2017, ITCS.

[21]  John M. Abowd,et al.  The U.S. Census Bureau Adopts Differential Privacy , 2018, KDD.

[22]  Marco Gaboardi,et al.  Locally Private Mean Estimation: Z-test and Tight Confidence Intervals , 2018, AISTATS.

[23]  Janardhan Kulkarni,et al.  Collecting Telemetry Data Privately , 2017, NIPS.

[24]  J. Tukey Non-Parametric Estimation II. Statistically Equivalent Blocks and Tolerance Regions--The Continuous Case , 1947 .

[25]  Alexander Gammerman,et al.  Conformal calibrators , 2019, COPA.

[26]  Cynthia Dwork,et al.  Differential Privacy and the US Census , 2019, PODS.

[27]  W. Gasarch,et al.  The Book Review Column 1 Coverage Untyped Systems Simple Types Recursive Types Higher-order Systems General Impression 3 Organization, and Contents of the Book , 2022 .

[28]  Raef Bassily,et al.  Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds , 2014, 1405.7085.

[29]  John C. Duchi,et al.  Knowing what You Know: valid and validated confidence sets in multiclass and multilabel prediction , 2020, J. Mach. Learn. Res..

[30]  Aaron Roth,et al.  Moment Multicalibration for Uncertainty Estimation , 2020, COLT.

[31]  Vladimir Vovk,et al.  Conditional validity of inductive conformal predictors , 2012, Machine Learning.

[32]  José Miguel Contreras,et al.  Databiology Lab CORONAHACK: Collection of Public COVID-19 Data , 2020, bioRxiv.

[33]  Stephen Bates,et al.  A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification , 2021, ArXiv.

[34]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[35]  Georgy Noarov,et al.  Online Multivalid Learning: Means, Moments, and Prediction Intervals , 2021, ArXiv.

[36]  Rafael Izbicki,et al.  Distribution-free conditional predictive bands using density estimators , 2020, AISTATS.

[37]  Seth Neel,et al.  Oracle Efficient Private Non-Convex Optimization , 2020, ICML.

[38]  Xiaoyu Hu,et al.  A Distribution-Free Test of Covariate Shift Using Conformal Prediction , 2020 .

[39]  E. Grycko Classification with Set-Valued Decision Functions , 1993 .

[40]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[41]  Abraham Wald,et al.  An Extension of Wilks' Method for Setting Tolerance Limits , 1943 .

[42]  Larry A. Wasserman,et al.  A conformal prediction approach to explore functional data , 2013, Annals of Mathematics and Artificial Intelligence.

[43]  Adam D. Smith,et al.  Privacy-preserving statistical estimation with optimal convergence rates , 2011, STOC '11.

[44]  Alexander Gammerman,et al.  Machine-Learning Applications of Algorithmic Randomness , 1999, ICML.

[45]  John C. Duchi,et al.  Robust Validation: Confident Predictions Even When Distributions Shift , 2020, ArXiv.

[46]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[47]  Jing Lei Classification with confidence , 2014 .

[48]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[49]  Emmanuel J. Candès,et al.  Conformal inference of counterfactuals and individual treatment effects , 2020, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[50]  S. S. Wilks Determination of Sample Sizes for Setting Tolerance Limits , 1941 .

[51]  Alessandro Rinaldo,et al.  Distribution-Free Predictive Inference for Regression , 2016, Journal of the American Statistical Association.

[52]  Thomas Steinke,et al.  Generalization for Adaptively-chosen Estimators via Stable Median , 2017, COLT.

[53]  Michael I. Jordan,et al.  Distribution-Free, Risk-Controlling Prediction Sets , 2021, J. ACM.

[54]  Eyke Hüllermeier,et al.  Efficient set-valued prediction in multi-class classification , 2019, Data Mining and Knowledge Discovery.

[55]  Or Sheffet,et al.  Differentially Private Ordinary Least Squares , 2015, ICML.

[56]  Larry A. Wasserman,et al.  Least Ambiguous Set-Valued Classifiers With Bounded Error Levels , 2016, Journal of the American Statistical Association.

[57]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[58]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Yaniv Romano,et al.  Conformalized Quantile Regression , 2019, NeurIPS.

[60]  Leying Guan,et al.  Conformal prediction with localization , 2019, 1908.08558.

[61]  Yaniv Romano,et al.  Classification with Valid and Adaptive Coverage , 2020, NeurIPS.

[62]  Thomas Mathew,et al.  Statistical Tolerance Regions: Theory, Applications, and Computation , 2009 .