Accuracy, Interpretability and Differential Privacy via Explainable Boosting

We show that adding differential privacy to Explainable Boosting Machines (EBMs), a recent method for training interpretable ML models, yields state-of-the-art accuracy while protecting privacy. Our experiments on multiple classification and regression datasets show that DP-EBM models suffer surprisingly little accuracy loss even with strong differential privacy guarantees. In addition to high accuracy, two other benefits of applying DP to EBMs are: a) trained models provide exact global and local interpretability, which is often important in settings where differential privacy is needed; and b) the models can be edited after training without loss of privacy to correct errors which DP noise may have introduced.

[1]  Joel Nothman,et al.  SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.

[2]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[3]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2016, J. Priv. Confidentiality.

[4]  Zahidul Islam,et al.  Decision Tree Classification with Differential Privacy: A Survey , 2016 .

[5]  Kunal Talwar,et al.  Private selection from private candidates , 2018, STOC.

[6]  Vitaly Shmatikov,et al.  Exploiting Unintended Feature Leakage in Collaborative Learning , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[7]  Pramod Viswanath,et al.  The Composition Theorem for Differential Privacy , 2013, IEEE Transactions on Information Theory.

[8]  K. Jarrod Millman,et al.  Array programming with NumPy , 2020, Nat..

[9]  Pol Mac Aonghusa,et al.  Diffprivlib: The IBM Differential Privacy Library , 2019, ArXiv.

[10]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[11]  Vitaly Shmatikov,et al.  Differential Privacy Has Disparate Impact on Model Accuracy , 2019, NeurIPS.

[12]  Cynthia Rudin,et al.  In Pursuit of Interpretable, Fair and Accurate Machine Learning for Criminal Recidivism Prediction , 2020, ArXiv.

[13]  Úlfar Erlingsson,et al.  The Secret Sharer: Measuring Unintended Neural Network Memorization & Extracting Secrets , 2018, ArXiv.

[14]  Johannes Gehrke,et al.  Intelligible models for classification and regression , 2012, KDD.

[15]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[16]  Nilotpal Chakravarti,et al.  Isotonic Median Regression: A Linear Programming Approach , 1989, Math. Oper. Res..

[17]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[18]  Bingsheng He,et al.  Privacy-Preserving Gradient Boosting Decision Trees , 2019, AAAI.

[19]  R. Tibshirani,et al.  Generalized Additive Models , 1986 .

[20]  C. vanEeden Testing and estimating ordered parameters of probability distribution , 1958 .

[21]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[22]  Cynthia Rudin,et al.  Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.

[23]  Jinshuo Dong,et al.  Deep Learning with Gaussian Differential Privacy , 2020, Harvard data science review.

[24]  Rebecca N. Wright,et al.  A Practical Differentially Private Random Decision Tree Classifier , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[25]  Or Sheffet Private Approximations of the 2nd-Moment Matrix Using Existing Techniques in Linear Regression , 2015, ArXiv.

[26]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[27]  Aaron Roth,et al.  Gaussian differential privacy , 2019, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[28]  Gilles Louppe,et al.  Independent consultant , 2013 .

[29]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[30]  Md Zahidul Islam,et al.  A Differentially Private Decision Forest , 2015, AusDM.

[31]  Johannes Gehrke,et al.  Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[32]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[33]  Maria-Florina Balcan,et al.  Scalable and Provably Accurate Algorithms for Differentially Private Distributed Decision Tree Learning , 2020, ArXiv.

[34]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[35]  Johannes Gehrke,et al.  Accurate intelligible models with pairwise interactions , 2013, KDD.

[36]  Anand D. Sarwate,et al.  Symmetric matrix perturbation for differentially-private principal component analysis , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[37]  Rich Caruana,et al.  InterpretML: A Unified Framework for Machine Learning Interpretability , 2019, ArXiv.

[38]  Varun Gupta,et al.  On the Compatibility of Privacy and Fairness , 2019, UMAP.

[39]  Rich Caruana,et al.  How Interpretable and Trustworthy are GAMs? , 2020, KDD.

[40]  Roman Garnett,et al.  Differentially Private Bayesian Optimization , 2015, ICML.

[41]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.