ModelTracker: Redesigning Performance Analysis Tools for Machine Learning

Model building in machine learning is an iterative process. The performance analysis and debugging step typically involves a disruptive cognitive switch from model building to error analysis, discouraging an informed approach to model building. We present ModelTracker, an interactive visualization that subsumes information contained in numerous traditional summary statistics and graphs while displaying example-level performance and enabling direct error examination and debugging. Usage analysis from machine learning practitioners building real models with ModelTracker over six months shows ModelTracker is used often and throughout model building. A controlled experiment focusing on ModelTracker's debugging capabilities shows participants prefer ModelTracker over traditional tools without a loss in model performance.

[1]  James A. Landay,et al.  Examining Difficulties Software Developers Encounter in the Adoption of Statistical Machine Learning , 2008, AAAI.

[2]  Ron Kohavi,et al.  Visualizing the Simple Bayesian Classi er , 1997 .

[3]  Pak Chung Wong,et al.  Guest Editor's Introduction: Visual Data Mining , 1999, IEEE Computer Graphics and Applications.

[4]  Kwan-Liu Ma,et al.  Flow-based scatterplots for sensitivity analysis , 2010, 2010 IEEE Symposium on Visual Analytics Science and Technology.

[5]  John T. Stasko,et al.  An interactive visual testbed system for dimension reduction and clustering of large-scale high-dimensional data , 2013, Electronic Imaging.

[6]  Jerry Alan Fails,et al.  Interactive machine learning , 2003, IUI '03.

[7]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[8]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[9]  James A. Landay,et al.  Gestalt: integrated support for implementation and analysis in machine learning , 2010, UIST.

[10]  Desney S. Tan,et al.  Using Multiple Models to Understand Data , 2011, IJCAI.

[11]  Desney S. Tan,et al.  EnsembleMatrix: interactive visualization to support machine learning with multiple classifiers , 2009, CHI.

[12]  Joost Broekens,et al.  Chapter 3 Object-Centered Interactive Multi-Dimensional Scaling : Ask the Expert , 2006 .

[13]  David Maxwell Chickering,et al.  ICE: Enabling Non-Experts to Build Models Interactively for Large-Scale Lopsided Problems , 2014, ArXiv.

[14]  Fabrice Rossi,et al.  Visual Data Mining and Machine Learning , 2006, ESANN.

[15]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[16]  Michael Gleicher,et al.  Splatterplots: Overcoming Overdraw in Scatter Plots , 2013, IEEE Transactions on Visualization and Computer Graphics.

[17]  Andreas Wierse,et al.  Information Visualization in Data Mining and Knowledge Discovery , 2001 .

[18]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[19]  Hans-Peter Kriegel,et al.  Visual classification: an interactive approach to decision tree construction , 1999, KDD '99.

[20]  Vasant Honavar,et al.  Gaining insights into support vector machine pattern classifiers using projection-based tour methods , 2001, KDD '01.

[21]  Albert Fornells,et al.  A study of the effect of different types of noise on the precision of supervised learning techniques , 2010, Artificial Intelligence Review.

[22]  Daniel A. Keim,et al.  Variable Binned Scatter Plots , 2010, Inf. Vis..

[23]  Perry R. Cook,et al.  Human model evaluation in interactive supervised learning , 2011, CHI.