Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation

Black-box risk scoring models permeate our lives, yet are typically proprietary or opaque. We propose Distill-and-Compare, an approach to audit such models without probing the black-box model API or pre-defining features to audit. To gain insight into black-box models, we treat them as teachers, training transparent student models to mimic the risk scores assigned by the black-box models. We compare the mimic model trained with distillation to a second, un-distilled transparent model trained on ground truth outcomes, and use differences between the two models to gain insight into the black-box model. We demonstrate the approach on four data sets: COMPAS, Stop-and-Frisk, Chicago Police, and Lending Club. We also propose a statistical test to determine if a data set is missing key features used to train the black-box model. Our test finds that the ProPublica data is likely missing key feature(s) used in COMPAS.

[1]  Jon M. Kleinberg,et al.  Inherent Trade-Offs in Algorithmic Fairness , 2018, PERV.

[2]  Krishna P. Gummadi,et al.  Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment , 2016, WWW.

[3]  Manuel Lingo,et al.  Discriminatory Power - An Obsolete Validation Criterion? , 2008 .

[4]  Alexandra Chouldechova,et al.  Fairer and more accurate, but for whom? , 2017, ArXiv.

[5]  Giles Hooker,et al.  The computerized adaptive diagnostic test for major depressive disorder (CAD-MDD): a screening tool for depression. , 2013, The Journal of clinical psychiatry.

[6]  S. Athey,et al.  Generalized random forests , 2016, The Annals of Statistics.

[7]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[8]  Avi Feller,et al.  Algorithmic Decision Making and the Cost of Fairness , 2017, KDD.

[9]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[10]  Jude W. Shavlik,et al.  in Advances in Neural Information Processing , 1996 .

[11]  COMPAS Risk Scales : Demonstrating Accuracy Equity and Predictive Parity Performance of the COMPAS Risk Scales in Broward County , 2016 .

[12]  Roxana Geambasu,et al.  FairTest: Discovering Unwarranted Associations in Data-Driven Applications , 2015, 2017 IEEE European Symposium on Security and Privacy (EuroS&P).

[13]  Panagiotis Papapetrou,et al.  A peek into the black box: exploring classifiers by randomization , 2014, Data Mining and Knowledge Discovery.

[14]  Johannes Gehrke,et al.  Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[15]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[16]  Cynthia Rudin,et al.  Interpretable classification models for recidivism prediction , 2015, 1503.07810.

[17]  Jon M. Kleinberg,et al.  Inherent Trade-Offs in the Fair Determination of Risk Scores , 2016, ITCS.

[18]  Anderson Ara,et al.  Classification methods applied to credit scoring: A systematic review and overall comparison , 2016, 1602.02137.

[19]  John Langford,et al.  A Reductions Approach to Fair Classification , 2018, ICML.

[20]  Albert Gordo,et al.  Transparent Model Distillation , 2018, ArXiv.

[21]  Edward S. Neukrug,et al.  Essentials of Testing and Assessment: A Practical Guide for Counselors, Social Workers, and Psychologists , 2005 .

[22]  Zhe Zhang,et al.  Identifying Significant Predictive Bias in Classifiers , 2016, ArXiv.

[23]  Johannes Gehrke,et al.  Intelligible models for classification and regression , 2012, KDD.

[24]  Richard D Riley,et al.  External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges , 2016, BMJ.

[25]  Chris Russell,et al.  Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR , 2017, ArXiv.

[26]  Justin M. Rao,et al.  Precinct or Prejudice? Understanding Racial Disparities in New York City's Stop-and-Frisk Policy , 2016 .

[27]  Joseph Sexton,et al.  Standard errors for bagged and random forest estimators , 2009, Comput. Stat. Data Anal..

[28]  Justin M. Rao,et al.  Precinct or Prejudice? Understanding Racial Disparities in New York City's Stop-and-Frisk Policy , 2015 .

[29]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[30]  Seth Neel,et al.  Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness , 2017, ICML.

[31]  Christopher T. Lowenkamp,et al.  False Positives, False Negatives, and False Analyses: A Rejoinder to "Machine Bias: There's Software Used across the Country to Predict Future Criminals. and It's Biased against Blacks" , 2016 .

[32]  Yair Zick,et al.  Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[33]  Carlos Eduardo Scheidegger,et al.  Certifying and Removing Disparate Impact , 2014, KDD.

[34]  Suresh Venkatasubramanian,et al.  Auditing Black-Box Models for Indirect Influence , 2016, ICDM.

[35]  Lalana Kagal,et al.  Iterative Orthogonal Feature Projection for Diagnosing Bias in Black-Box Models , 2016, ArXiv.

[36]  Alexandra Chouldechova,et al.  Fair prediction with disparate impact: A study of bias in recidivism prediction instruments , 2016, Big Data.

[37]  Stephen E. Fienberg,et al.  The Comparison and Evaluation of Forecasters. , 1983 .

[38]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[39]  Johannes Gehrke,et al.  Accurate intelligible models with pairwise interactions , 2013, KDD.

[40]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[41]  Hao Wang,et al.  On the Direction of Discrimination: An Information-Theoretic Analysis of Disparate Impact in Machine Learning , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[42]  Suresh Venkatasubramanian,et al.  Auditing black-box models for indirect influence , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[43]  James Y. Zou,et al.  Multiaccuracy: Black-Box Post-Processing for Fairness in Classification , 2018, AIES.