AnchorViz: Facilitating Classifier Error Discovery through Interactive Semantic Data Exploration

When building a classifier in interactive machine learning, human knowledge about the target class can be a powerful reference to make the classifier robust to unseen items. The main challenge lies in finding unlabeled items that can either help discover or refine concepts for which the current classifier has no corresponding features (i.e., it has feature blindness). Yet it is unrealistic to ask humans to come up with an exhaustive list of items, especially for rare concepts that are hard to recall. This paper presents AnchorViz, an interactive visualization that facilitates error discovery through semantic data exploration. By creating example-based anchors, users create a topology to spread data based on their similarity to the anchors and examine the inconsistencies between data points that are semantically related. The results from our user study show that AnchorViz helps users discover more prediction errors than stratified random and uncertainty sampling methods.

[1]  Ian H. Witten,et al.  Interactive machine learning: letting users build classifiers , 2002, Int. J. Hum. Comput. Stud..

[2]  Foster J. Provost,et al.  Why label when you can search?: alternatives to active learning for applying human resources to build classification models under extreme class imbalance , 2010, KDD.

[3]  M. Ross Quillian,et al.  Retrieval time from semantic memory , 1969 .

[4]  E. Tulving,et al.  Episodic and semantic memory , 1972 .

[5]  Jerry Alan Fails,et al.  Interactive machine learning , 2003, IUI '03.

[6]  R. Nosofsky Attention, similarity, and the identification-categorization relationship. , 1986, Journal of experimental psychology. General.

[7]  Panagiotis G. Ipeirotis,et al.  Beat the Machine: Challenging Humans to Find a Predictive Model's “Unknown Unknowns” , 2015, JDIQ.

[8]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[9]  Todd Kulesza,et al.  Structured labeling for facilitating concept evolution in machine learning , 2014, CHI.

[10]  Georges G. Grinstein,et al.  Table visualizations: a formal model and its applications , 2000 .

[11]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[12]  Maya Cakmak,et al.  Power to the People: The Role of Humans in Interactive Machine Learning , 2014, AI Mag..

[13]  John T. Stasko,et al.  Dust & Magnet: Multivariate Information Visualization Using a Magnet Metaphor , 2005, Inf. Vis..

[14]  David Maxwell Chickering,et al.  Interactive Semantic Featuring for Text Classification , 2016, ArXiv.

[15]  Bojana Dalbelo Basic,et al.  Concept decomposition by fuzzy k-means algorithm , 2003, Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003).

[16]  D. Reisberg The Oxford Handbook of Cognitive Psychology , 2013 .

[17]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[18]  Eric Horvitz,et al.  Identifying Unknown Unknowns in the Open World: Representations and Policies for Guided Exploration , 2016, AAAI.

[19]  Desney S. Tan,et al.  EnsembleMatrix: interactive visualization to support machine learning with multiple classifiers , 2009, CHI.

[20]  Thomas G. Dietterich,et al.  Interacting meaningfully with machine learning systems: Three experiments , 2009, Int. J. Hum. Comput. Stud..

[21]  Inderjit S. Dhillon,et al.  Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[22]  John T. Stasko,et al.  iVisClustering: An Interactive Visual Document Clustering via Topic Modeling , 2012, Comput. Graph. Forum.

[23]  Georges G. Grinstein,et al.  Vectorized Radviz and Its Application to Multiple Cluster Datasets , 2008, IEEE Transactions on Visualization and Computer Graphics.

[24]  Christopher Meek A Characterization of Prediction Errors , 2016, ArXiv.

[25]  Bongshin Lee,et al.  Squares: Supporting Interactive Performance Analysis for Multiclass Classifiers , 2017, IEEE Transactions on Visualization and Computer Graphics.

[26]  Peter Brusilovsky,et al.  Adaptive Visualization of Search Results: Bringing User Models to Visual Analytics , 2009, Inf. Vis..

[27]  Kai A. Olsen,et al.  Ideation Through Visualization: the VIBE System , 1991 .

[28]  Ben Shneiderman,et al.  Tree visualization with tree-maps: 2-d space-filling approach , 1992, TOGS.

[29]  David Cohn,et al.  Active Learning , 2010, Encyclopedia of Machine Learning.