AnchorViz: Facilitating Semantic Data Exploration and Concept Discovery for Interactive Machine Learning

When building a classifier in interactive machine learning (iML), human knowledge about the target class can be a powerful reference to make the classifier robust to unseen items. The main challenge lies in finding unlabeled items that can either help discover or refine concepts for which the current classifier has no corresponding features (i.e., it has feature blindness). Yet it is unrealistic to ask humans to come up with an exhaustive list of items, especially for rare concepts that are hard to recall. This article presents AnchorViz, an interactive visualization that facilitates the discovery of prediction errors and previously unseen concepts through human-driven semantic data exploration. By creating example-based or dictionary-based anchors representing concepts, users create a topology that (a) spreads data based on their similarity to the concepts and (b) surfaces the prediction and label inconsistencies between data points that are semantically related. Once such inconsistencies and errors are discovered, users can encode the new information as labels or features and interact with the retrained classifier to validate their actions in an iterative loop. We evaluated AnchorViz through two user studies. Our results show that AnchorViz helps users discover more prediction errors than stratified random and uncertainty sampling methods. Furthermore, during the beginning stages of a training task, an iML tool with AnchorViz can help users build classifiers comparable to the ones built with the same tool with uncertainty sampling and keyword search, but with fewer labels and more generalizable features. We discuss exploration strategies observed during the two studies and how AnchorViz supports discovering, labeling, and refining of concepts through a sensemaking loop.

[1]  Thomas Ertl,et al.  FeatureForge: A Novel Tool for Visually Supported Feature Engineering and Corpus Revision , 2012, COLING.

[2]  Hema Raghavan,et al.  InterActive Feature Selection , 2005, IJCAI.

[3]  David Maxwell Chickering,et al.  ModelTracker: Redesigning Performance Analysis Tools for Machine Learning , 2015, CHI.

[4]  Xiting Wang,et al.  Towards better analysis of machine learning models: A visual analytics perspective , 2017, Vis. Informatics.

[5]  Ben Shneiderman,et al.  Readings in information visualization - using vision to think , 1999 .

[6]  Ian H. Witten,et al.  Interactive machine learning: letting users build classifiers , 2002, Int. J. Hum. Comput. Stud..

[7]  Yang Wang,et al.  Manifold: A Model-Agnostic Framework for Interpretation and Diagnosis of Machine Learning Models , 2018, IEEE Transactions on Visualization and Computer Graphics.

[8]  Carla E. Brodley,et al.  Deploying an interactive machine learning system in an evidence-based practice center: abstrackr , 2012, IHI '12.

[9]  Charu C. Aggarwal,et al.  Mining Text Data , 2012 .

[10]  G. Murphy,et al.  The Big Book of Concepts , 2002 .

[11]  Peter Brusilovsky,et al.  Adaptive Visualization of Search Results: Bringing User Models to Visual Analytics , 2009, Inf. Vis..

[12]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[13]  Ashish Kapoor,et al.  FeatureInsight: Visual support for error-driven feature ideation in text classification , 2015, 2015 IEEE Conference on Visual Analytics Science and Technology (VAST).

[14]  Panagiotis G. Ipeirotis,et al.  Beat the Machine: Challenging Humans to Find a Predictive Model's “Unknown Unknowns” , 2015, JDIQ.

[15]  David Maxwell Chickering,et al.  Interactive Semantic Featuring for Text Classification , 2016, ArXiv.

[16]  Jerry Alan Fails,et al.  Interactive machine learning , 2003, IUI '03.

[17]  Paul A. Viola,et al.  Corrective feedback and persistent learning for information extraction , 2006, Artif. Intell..

[18]  Desney S. Tan,et al.  EnsembleMatrix: interactive visualization to support machine learning with multiple classifiers , 2009, CHI.

[19]  Valerio Pascucci,et al.  Visualizing High-Dimensional Data: Advances in the Past Decade , 2017, IEEE Transactions on Visualization and Computer Graphics.

[20]  Patrice Y. Simard,et al.  AnchorViz: Facilitating Classifier Error Discovery through Interactive Semantic Data Exploration , 2018, IUI.

[21]  E. Rosch,et al.  Family resemblances: Studies in the internal structure of categories , 1975, Cognitive Psychology.

[22]  Georges G. Grinstein,et al.  Vectorized Radviz and Its Application to Multiple Cluster Datasets , 2008, IEEE Transactions on Visualization and Computer Graphics.

[23]  P. Pirolli,et al.  The Sensemaking Process and Leverage Points for Analyst Technology as Identified Through Cognitive Task Analysis , 2007 .

[24]  Christopher Meek A Characterization of Prediction Errors , 2016, ArXiv.

[25]  Paulo J. G. Lisboa,et al.  Seeing is believing: The importance of visualization in real-world machine learning applications , 2011, ESANN.

[26]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[27]  Brian Mac Namee,et al.  EGAL: Exploration Guided Active Learning for TCBR , 2010, ICCBR.

[28]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[29]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[30]  Edward E. Smith,et al.  Categories and concepts , 1984 .

[31]  Desney S. Tan,et al.  CueFlik: interactive concept learning in image search , 2008, CHI.

[32]  Georges G. Grinstein,et al.  Table visualizations: a formal model and its applications , 2000 .

[33]  Bongshin Lee,et al.  Squares: Supporting Interactive Performance Analysis for Multiclass Classifiers , 2017, IEEE Transactions on Visualization and Computer Graphics.

[34]  Brian Mac Namee,et al.  Inside the Selection Box: Visualising active learning selection strategies , 2010 .

[35]  Enrico Bertini,et al.  INFUSE: Interactive Feature Selection for Predictive Modeling of High Dimensional Data , 2014, IEEE Transactions on Visualization and Computer Graphics.

[36]  Todd Kulesza,et al.  Structured labeling for facilitating concept evolution in machine learning , 2014, CHI.

[37]  Stuart K. Card,et al.  The cost structure of sensemaking , 1993, INTERCHI.

[38]  John T. Stasko,et al.  iVisClustering: An Interactive Visual Document Clustering via Topic Modeling , 2012, Comput. Graph. Forum.

[39]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[40]  Foster J. Provost,et al.  Why label when you can search?: alternatives to active learning for applying human resources to build classification models under extreme class imbalance , 2010, KDD.

[41]  Kai A. Olsen,et al.  Ideation Through Visualization: the VIBE System , 1991 .

[42]  John T. Stasko,et al.  Dust & Magnet: Multivariate Information Visualization Using a Magnet Metaphor , 2005, Inf. Vis..

[43]  Jarke J. van Wijk,et al.  BaobabView: Interactive construction and analysis of decision trees , 2011, 2011 IEEE Conference on Visual Analytics Science and Technology (VAST).

[44]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[45]  James Fogarty,et al.  Regroup: interactive machine learning for on-demand group creation in social networks , 2012, CHI.

[46]  Stan Matwin,et al.  Feature Engineering for Text Classification , 1999, ICML.

[47]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[48]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[49]  David Cohn,et al.  Active Learning , 2010, Encyclopedia of Machine Learning.

[50]  Jing Wu,et al.  Visual Diagnosis of Tree Boosting Methods , 2018, IEEE Transactions on Visualization and Computer Graphics.

[51]  Aniket Kittur,et al.  Apolo: making sense of large network data by combining rich user interaction and machine learning , 2011, CHI.

[52]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[53]  Ben Shneiderman,et al.  Tree visualization with tree-maps: 2-d space-filling approach , 1992, TOGS.

[54]  Scott R. Klemmer,et al.  Authoring sensor-based interactions by demonstration with direct manipulation and pattern recognition , 2007, CHI.

[55]  Edward M. Reingold,et al.  Graph drawing by force‐directed placement , 1991, Softw. Pract. Exp..

[56]  Burr Settles,et al.  Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances , 2011, EMNLP.

[57]  R. Nosofsky Attention, similarity, and the identification-categorization relationship. , 1986, Journal of experimental psychology. General.

[58]  Jeffrey Heer,et al.  Topic Model Diagnostics: Assessing Domain Relevance via Topical Alignment , 2013, ICML.

[59]  Michael S. Bernstein,et al.  Flock: Hybrid Crowd-Machine Learning Classifiers , 2015, CSCW.

[60]  Charu C. Aggarwal,et al.  Mining Text Data , 2012, Springer US.

[61]  Ross Maciejewski,et al.  The State‐of‐the‐Art in Predictive Visual Analytics , 2017, Comput. Graph. Forum.

[62]  Maya Cakmak,et al.  Power to the People: The Role of Humans in Interactive Machine Learning , 2014, AI Mag..

[63]  Eric Horvitz,et al.  Identifying Unknown Unknowns in the Open World: Representations and Policies for Guided Exploration , 2016, AAAI.

[64]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[65]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[66]  Chris North,et al.  Semantics of Directly Manipulating Spatializations , 2013, IEEE Transactions on Visualization and Computer Graphics.

[67]  Weng-Keen Wong,et al.  Why-oriented end-user debugging of naive Bayes text classification , 2011, ACM Trans. Interact. Intell. Syst..

[68]  Alex Endert,et al.  InterAxis: Steering Scatterplot Axes via Observation-Level Interaction , 2016, IEEE Transactions on Visualization and Computer Graphics.

[69]  Alex Endert,et al.  The State of the Art in Integrating Machine Learning into Visual Analytics , 2017, Comput. Graph. Forum.

[70]  Thomas G. Dietterich,et al.  Interacting meaningfully with machine learning systems: Three experiments , 2009, Int. J. Hum. Comput. Stud..