论文信息 - Sibyl: Understanding and Addressing the Usability Challenges of Machine Learning In High-Stakes Decision Making

Sibyl: Understanding and Addressing the Usability Challenges of Machine Learning In High-Stakes Decision Making

Machine learning (ML) is being applied to a diverse and ever-growing set of domains. In many cases, domain experts - who often have no expertise in ML or data science - are asked to use ML predictions to make high-stakes decisions. Multiple ML usability challenges can appear as result, such as lack of user trust in the model, inability to reconcile human-ML disagreement, and ethical concerns about oversimplification of complex problems to a single algorithm output. In this paper, we investigate the ML usability challenges that present in the domain of child welfare screening through a series of collaborations with child welfare screeners. Following the iterative design process between the ML scientists, visualization researchers, and domain experts (child screeners), we first identified four key ML challenges and honed in on one promising explainable ML technique to address them (local factor contributions). Then we implemented and evaluated our visual analytics tool, Sibyl, to increase the interpretability and interactivity of local factor contributions. The effectiveness of our tool is demonstrated by two formal user studies with 12 non-expert participants and 13 expert participants respectively. Valuable feedback was collected, from which we composed a list of design implications as a useful guideline for researchers who aim to develop an interpretable and interactive visualization tool for ML prediction models deployed for child welfare screeners and other similar domain experts.

[1] Zachary Chase Lipton. The mythos of model interpretability , 2016, ACM Queue.

[2] Minsuk Kahng,et al. ActiVis: Visual Exploration of Industry-Scale Deep Neural Network Models , 2017, IEEE Transactions on Visualization and Computer Graphics.

[3] Yang Wang,et al. Manifold: A Model-Agnostic Framework for Interpretation and Diagnosis of Machine Learning Models , 2018, IEEE Transactions on Visualization and Computer Graphics.

[4] Been Kim,et al. Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[5] Mennatallah El-Assady,et al. explAIner: A Visual Analytics Framework for Interactive and Explainable Machine Learning , 2019, IEEE Transactions on Visualization and Computer Graphics.

[6] Cynthia Rudin,et al. All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously , 2019, J. Mach. Learn. Res..

[7] Lei Xu,et al. Modeling Tabular data using Conditional GAN , 2019, NeurIPS.

[8] Hyunil Kim,et al. Lifetime Prevalence of Investigating Child Maltreatment Among US Children. , 2017, American journal of public health.

[9] Alexander M. Rush,et al. LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks , 2016, IEEE Transactions on Visualization and Computer Graphics.

[10] Marko Bohanec,et al. Perturbation-Based Explanations of Prediction Models , 2018, Human and Machine Learning.

[11] Cynthia Rudin,et al. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.

[12] Wendy E. Mackay,et al. Human-Centred Machine Learning , 2016, CHI Extended Abstracts.

[13] Scott M. Lundberg,et al. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery , 2018, Nature Biomedical Engineering.

[14] Qian Yang,et al. Designing Theory-Driven User-Centric Explainable AI , 2019, CHI.

[15] Kenney Ng,et al. Interacting with Predictions: Visual Inspection of Black-box Machine Learning Models , 2016, CHI.

[16] Tamara Munzner,et al. A Nested Model for Visualization Design and Validation , 2009, IEEE Transactions on Visualization and Computer Graphics.

[17] Erika Schroeder,et al. National Highway Traffic Safety Administration (NHTSA) notes. Drugged driving expert panel report: a consensus protocol for assessing the potential of drugs to impair driving. , 2012, Annals of emergency medicine.

[18] Scott Lundberg,et al. A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[19] Jimeng Sun,et al. RetainVis: Visual Analytics with Interpretable and Interactive Recurrent Neural Networks on Electronic Medical Records , 2018, IEEE Transactions on Visualization and Computer Graphics.

[20] Steven M. Drucker,et al. Gamut: A Design Probe to Understand How Data Scientists Understand Machine Learning Models , 2019, CHI.