Prototype selection for interpretable classification

We discuss a method for selecting prototypes in the classification setting (in which the samples fall into known discrete categories). Our method of focus is derived from three basic properties that we believe a good prototype set should satisfy. This intuition is translated into a set cover optimization problem, which we solve approximately using standard approaches. While prototype selection is usually viewed as purely a means toward building an efficient classifier, in this paper we emphasize the inherent value of having a set of prototypical elements. That said, by using the nearest-neighbor rule on the set of prototypes, we can of course discuss our method as a classifier as well. We demonstrate the interpretative value of producing prototypes on the well-known USPS ZIP code digits data set and show that as a classifier it performs reasonably well. We apply the method to a proteomics data set in which the samples are strings and therefore not naturally embedded in a vector space. Our method is compatible with any dissimilarity measure, making it amenable to situations in which using a non-Euclidean metric is desirable or even necessary.

[1]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[2]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[3]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[4]  Yann LeCun,et al.  Efficient Pattern Recognition Using a New Transformation Distance , 1992, NIPS.

[5]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[6]  U. Feige A threshold of ln n for approximating set cover , 1998, JACM.

[7]  Patrice Y. Simard,et al.  Metrics and Models for Handwritten Character Recognition , 1998 .

[8]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[9]  Teuvo Kohonen,et al.  Self-Organizing Maps, Third Edition , 2001, Springer Series in Information Sciences.

[10]  Bernhard Schölkopf,et al.  A Kernel Approach for Vector Quantization with Guaranteed Distortion Bounds , 2001, AISTATS.

[11]  John Shawe-Taylor,et al.  The Set Covering Machine , 2003, J. Mach. Learn. Res..

[12]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[13]  John C. Wierman,et al.  A SLLN for a one-dimensional class cover problem , 2002 .

[14]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[15]  Vijay V. Vazirani,et al.  Approximation Algorithms , 2001, Springer Berlin Heidelberg.

[16]  Francisco Herrera,et al.  Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study , 2003, IEEE Trans. Evol. Comput..

[17]  Carey E. Priebe,et al.  Classification Using Class Cover Catch Digraphs , 2003, J. Classif..

[18]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[19]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[20]  Lenore Cowen,et al.  Approximation Algorithms for the Class Cover Problem , 2004, Annals of Mathematics and Artificial Intelligence.

[21]  Jason Weston,et al.  Mismatch string kernels for discriminative protein classification , 2004, Bioinform..

[22]  Zakria Hussain The Linear Programming Set Covering Machine , 2004 .

[23]  Hans C. van Houwelingen,et al.  The Elements of Statistical Learning, Data Mining, Inference, and Prediction. Trevor Hastie, Robert Tibshirani and Jerome Friedman, Springer, New York, 2001. No. of pages: xvi+533. ISBN 0‐387‐95284‐5 , 2004 .

[24]  Mee Young Park,et al.  L 1-regularization path algorithm for generalized linear models , 2006 .

[25]  Filiberto Pla,et al.  Experimental study on prototype optimisation algorithms for prototype-based classification in vector spaces , 2006, Pattern Recognit..

[26]  T. Ho,et al.  Data Complexity in Pattern Recognition , 2006 .

[27]  Ojas Parekh,et al.  A Unified Approach to Approximating Partial Covering Problems , 2006, ESA.

[28]  C. E. Priebe,et al.  A new family of random graphs for testing spatial segregation , 2007 .

[29]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[30]  Francisco Herrera,et al.  Evolutionary stratified training set selection for extracting classification rules with trade off precision-interpretability , 2007, Data Knowl. Eng..

[31]  Yang Jing L1 Regularization Path Algorithm for Generalized Linear Models , 2008 .

[32]  Atsuyoshi Nakamura,et al.  Convex sets as prototypes for classifying patterns , 2009, Eng. Appl. Artif. Intell..

[33]  Amir F. Atiya,et al.  A Novel Template Reduction Approach for the $K$-Nearest Neighbor Method , 2009, IEEE Transactions on Neural Networks.

[34]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[35]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[36]  Elena Marchiori,et al.  Class Conditional Nearest Neighbor for Large Margin Instance Selection , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.