Autonomous Visualization

Many classification algorithms suffer from a lack of human interpretability. Using such classifiers to solve real world problems often requires blind faith in the given model. In this paper we present a novel approach to classification that takes into account interpretability and visualization of the results. We attempt to efficiently discover the most relevant snapshot of the data, in the form of a two-dimensional scatter plot with easily understandable axes. We then use this plot as the basis for a classification algorithm. Furthermore, we investigate the trade-off between classification accuracy and interpretability by comparing the performance of our classifier on real data with that of several traditional classifiers. Upon evaluating our algorithm on a wide range of canonical data sets we find that, in most cases, it is possible to obtain additional interpretability with little or no loss in classification accuracy.

[1]  John W. Tukey,et al.  A Projection Pursuit Algorithm for Exploratory Data Analysis , 1974, IEEE Transactions on Computers.

[2]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[3]  Pat Langley,et al.  Data-Driven Discovery of Physical Laws , 1981, Cogn. Sci..

[4]  Sholom M. Weiss,et al.  Maximizing the Predictive Value of Production Rules , 1990, Artif. Intell..

[5]  P. Groenen,et al.  Modern multidimensional scaling , 1996 .

[6]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[7]  John David N. Dionisio,et al.  Case-based explanation of non-case-based learning methods , 1999, AMIA.

[8]  Markus H. Gross,et al.  H-BLOB: a hierarchical visual clustering method using implicit surfaces , 2000, Proceedings Visualization 2000. VIS 2000 (Cat. No.00CH37145).

[9]  H. H. I8H,et al.  Mixtures of Rectangles: Interpretable Soft Clustering , 2001 .

[10]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[11]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[12]  Haim Levkowitz,et al.  From Visual Data Exploration to Visual Data Mining: A Survey , 2003, IEEE Trans. Vis. Comput. Graph..

[13]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[14]  Brian Falkenhainer,et al.  Integrating quantitative and qualitative discovery: The ABACUS system , 2004, Machine Learning.

[15]  Bodo W. Reinisch,et al.  Automated diagnostics for resonance signature recognition on IMAGE/RPI plasmagrams , 2004 .

[16]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[17]  Ivan Bratko,et al.  Simple and effective visual models for gene expression cancer diagnostics , 2005, KDD '05.

[18]  Ryszard S. Michalski,et al.  Integrating Quantitative and Qualitative Discovery: The ABACUS System , 2005, Machine Learning.

[19]  Eun-Kyung Lee,et al.  Projection Pursuit for Exploratory Supervised Classification , 2005 .