Gamut: A Design Probe to Understand How Data Scientists Understand Machine Learning Models

Without good models and the right tools to interpret them, data scientists risk making decisions based on hidden biases, spurious correlations, and false generalizations. This has led to a rallying cry for model interpretability. Yet the concept of interpretability remains nebulous, such that researchers and tool designers lack actionable guidelines for how to incorporate interpretability into models and accompanying tools. Through an iterative design process with expert machine learning researchers and practitioners, we designed a visual analytics system, Gamut, to explore how interactive interfaces could better support model interpretation. Using Gamut as a probe, we investigated why and how professional data scientists interpret models, and how interface affordances can support data scientists in answering questions about model interpretability. Our investigation showed that interpretability is not a monolithic concept: data scientists have different reasons to interpret models and tailor explanations for specific audiences, often balancing competing concerns of simplicity and completeness. Participants also asked to use Gamut in their work, highlighting its potential to help data scientists understand their own data.

[1]  Enrico Bertini,et al.  INFUSE: Interactive Feature Selection for Predictive Modeling of High Dimensional Data , 2014, IEEE Transactions on Visualization and Computer Graphics.

[2]  Emily Chen,et al.  How do Humans Understand Explanations from Machine Learning Systems? An Evaluation of the Human-Interpretability of Explanation , 2018, ArXiv.

[3]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[4]  Thomas G. Dietterich,et al.  Interacting meaningfully with machine learning systems: Three experiments , 2009, Int. J. Hum. Comput. Stud..

[5]  Timnit Gebru,et al.  Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification , 2018, FAT.

[6]  Michael I. Jordan,et al.  Machine learning: Trends, perspectives, and prospects , 2015, Science.

[7]  Johannes Gehrke,et al.  Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[8]  Kenney Ng,et al.  Interacting with Predictions: Visual Inspection of Black-box Machine Learning Models , 2016, CHI.

[9]  Huamin Qu,et al.  RuleMatrix: Visualizing and Understanding Classifiers with Rules , 2018, IEEE Transactions on Visualization and Computer Graphics.

[10]  Alan Y. Chiang,et al.  Generalized Additive Models: An Introduction With R , 2007, Technometrics.

[11]  Tim Miller,et al.  Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..

[12]  Jeffrey Heer,et al.  Interpretation and trust: designing model-driven visualizations for text analysis , 2012, CHI.

[13]  R. Tibshirani,et al.  Generalized Additive Models , 1986 .

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[16]  Thomas G. Dietterich,et al.  Interactive visualization for testing Markov Decision Processes: MDPVIS , 2017, J. Vis. Lang. Comput..

[17]  Mark Rouncefield,et al.  Probes and participation , 2008, PDC.

[18]  Jarke J. van Wijk,et al.  Instance-Level Explanations for Fraud Detection: A Case Study , 2018, ICML 2018.

[19]  Yang Wang,et al.  Manifold: A Model-Agnostic Framework for Interpretation and Diagnosis of Machine Learning Models , 2018, IEEE Transactions on Visualization and Computer Graphics.

[20]  Jeffrey Heer,et al.  SpanningAspectRatioBank Easing FunctionS ArrayIn ColorIn Date Interpolator MatrixInterpola NumObjecPointI Rectang ISchedu Parallel Pause Scheduler Sequen Transition Transitioner Transiti Tween Co DelimGraphMLCon IData JSONCon DataField DataSc Dat DataSource Data DataUtil DirtySprite LineS RectSprite , 2011 .

[21]  Rich Caruana,et al.  Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation , 2017, AIES.

[22]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[23]  Arvind Narayanan,et al.  Semantics derived automatically from language corpora contain human-like biases , 2016, Science.

[24]  Seth Flaxman,et al.  European Union Regulations on Algorithmic Decision-Making and a "Right to Explanation" , 2016, AI Mag..

[25]  Fan Zhang,et al.  Recent progress and trends in predictive visual analytics , 2017, Frontiers of Computer Science.

[26]  Torsten Hothorn,et al.  Boosting additive models using component-wise P-Splines , 2008, Comput. Stat. Data Anal..

[27]  Lalana Kagal,et al.  Explaining Explanations: An Approach to Evaluating Interpretability of Machine Learning , 2018, ArXiv.

[28]  Chris Russell,et al.  Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR , 2017, ArXiv.

[29]  Johannes Gehrke,et al.  Accurate intelligible models with pairwise interactions , 2013, KDD.

[30]  Allison Druin,et al.  Technology probes: inspiring design for and with families , 2003, CHI '03.

[31]  Minsuk Kahng,et al.  Visual Analytics in Deep Learning: An Interrogative Survey for the Next Frontiers , 2018, IEEE Transactions on Visualization and Computer Graphics.

[32]  Dimitrios I. Fotiadis,et al.  Machine learning applications in cancer prognosis and prediction , 2014, Computational and structural biotechnology journal.

[33]  Mohan S. Kankanhalli,et al.  Trends and Trajectories for Explainable, Accountable and Intelligible Systems: An HCI Research Agenda , 2018, CHI.

[34]  Sang Michael Xie,et al.  Combining satellite imagery and machine learning to predict poverty , 2016, Science.

[35]  David Maxwell Chickering,et al.  ModelTracker: Redesigning Performance Analysis Tools for Machine Learning , 2015, CHI.

[36]  Ben Shneiderman,et al.  Readings in information visualization - using vision to think , 1999 .

[37]  Qian Yang,et al.  Grounding Interactive Machine Learning Tool Design in How Non-Experts Actually Build Models , 2018, Conference on Designing Interactive Systems.

[38]  Minsuk Kahng,et al.  Visual exploration of machine learning results using data cube analysis , 2016, HILDA '16.

[39]  Kristin A. Cook,et al.  Illuminating the Path: The Research and Development Agenda for Visual Analytics , 2005 .

[40]  William W. Gaver,et al.  Design: Cultural probes , 1999, INTR.

[41]  Josua Krause,et al.  A User Study on the Effect of Aggregating Explanations for Interpreting Machine Learning Models , 2018 .

[42]  Kelvyn Jones,et al.  Moving out of the linear rut: the possibilities of generalized additive models , 1992 .

[43]  R. Kennedy,et al.  Defense Advanced Research Projects Agency (DARPA). Change 1 , 1996 .

[44]  Wendy E. Mackay,et al.  Human-Centred Machine Learning , 2016, CHI Extended Abstracts.

[45]  Yindalon Aphinyanagphongs,et al.  A Workflow for Visual Diagnostics of Binary Classifiers using Instance-Level Explanations , 2017, 2017 IEEE Conference on Visual Analytics Science and Technology (VAST).

[46]  Jessica Lee,et al.  A Dynamic Pipeline for Spatio-Temporal Fire Risk Prediction , 2018, KDD.

[47]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[48]  Lalana Kagal,et al.  Explaining Explanations: An Overview of Interpretability of Machine Learning , 2018, 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA).

[49]  Wojciech Samek,et al.  Methods for interpreting and understanding deep neural networks , 2017, Digit. Signal Process..

[50]  Brandon M. Greenwell,et al.  Interpretable Machine Learning , 2019, Hands-On Machine Learning with R.

[51]  David Weinberger,et al.  Accountability of AI Under the Law: The Role of Explanation , 2017, ArXiv.

[52]  Bongshin Lee,et al.  Squares: Supporting Interactive Performance Analysis for Multiclass Classifiers , 2017, IEEE Transactions on Visualization and Computer Graphics.

[53]  S. Wood Generalized Additive Models: An Introduction with R , 2006 .

[54]  Bernd Bischl,et al.  iml: An R package for Interpretable Machine Learning , 2018, J. Open Source Softw..

[55]  Duen Horng Chau,et al.  Firebird: Predicting Fire Risk and Prioritizing Fire Inspections in Atlanta , 2016, KDD.

[56]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[57]  Daniel A. Keim,et al.  What you see is what you can change: Human-centered machine learning by interactive visualization , 2017, Neurocomputing.

[58]  David S. Wishart,et al.  Applications of Machine Learning in Cancer Prediction and Prognosis , 2006, Cancer informatics.

[59]  Ross Maciejewski,et al.  The State‐of‐the‐Art in Predictive Visual Analytics , 2017, Comput. Graph. Forum.

[60]  Or Biran,et al.  Explanation and Justification in Machine Learning : A Survey Or , 2017 .

[61]  Daniel Servén,et al.  pyGAM: Generalized Additive Models in Python , 2018 .

[62]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[63]  Daniel S. Weld,et al.  Intelligible Artificial Intelligence , 2018, ArXiv.

[64]  Johannes Gehrke,et al.  Intelligible models for classification and regression , 2012, KDD.

[65]  Minsuk Kahng,et al.  ActiVis: Visual Exploration of Industry-Scale Deep Neural Network Models , 2017, IEEE Transactions on Visualization and Computer Graphics.

[66]  Daniel G. Goldstein,et al.  Manipulating and Measuring Model Interpretability , 2018, CHI.

[67]  Ashish Kapoor,et al.  FeatureInsight: Visual support for error-driven feature ideation in text classification , 2015, 2015 IEEE Conference on Visual Analytics Science and Technology (VAST).