论文信息 - Prototype selection for interpretable classification

Prototype selection for interpretable classification

We discuss a method for selecting prototypes in the classification setting (in which the samples fall into known discrete categories). Our method of focus is derived from three basic properties that we believe a good prototype set should satisfy. This intuition is translated into a set cover optimization problem, which we solve approximately using standard approaches. While prototype selection is usually viewed as purely a means toward building an efficient classifier, in this paper we emphasize the inherent value of having a set of prototypical elements. That said, by using the nearest-neighbor rule on the set of prototypes, we can of course discuss our method as a classifier as well. We demonstrate the interpretative value of producing prototypes on the well-known USPS ZIP code digits data set and show that as a classifier it performs reasonably well. We apply the method to a proteomics data set in which the samples are strings and therefore not naturally embedded in a vector space. Our method is compatible with any dissimilarity measure, making it amenable to situations in which using a non-Euclidean metric is desirable or even necessary.

R. Tibshirani | J. Bien

[1] Peter E. Hart,et al. Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[2] C. G. Hilborn,et al. The Condensed Nearest Neighbor Rule , 1967 .

[3] Josef Kittler,et al. Pattern recognition : a statistical approach , 1982 .

[4] Yann LeCun,et al. Efficient Pattern Recognition Using a New Transformation Distance , 1992, NIPS.

[5] Yoshua Bengio,et al. Pattern Recognition and Neural Networks , 1995 .

[6] U. Feige. A threshold of ln n for approximating set cover , 1998, JACM.

[7] Patrice Y. Simard,et al. Metrics and Models for Handwritten Character Recognition , 1998 .

[8] Trevor Hastie,et al. The Elements of Statistical Learning , 2001 .

[9] Teuvo Kohonen,et al. Self-Organizing Maps, Third Edition , 2001, Springer Series in Information Sciences.

[10] Bernhard Schölkopf,et al. A Kernel Approach for Vector Quantization with Guaranteed Distortion Bounds , 2001, AISTATS.

[11] John Shawe-Taylor,et al. The Set Covering Machine , 2003, J. Mach. Learn. Res..

[12] R. Tibshirani,et al. Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[13] John C. Wierman,et al. A SLLN for a one-dimensional class cover problem , 2002 .

[14] Robert Tibshirani,et al. 1-norm Support Vector Machines , 2003, NIPS.

[15] Vijay V. Vazirani,et al. Approximation Algorithms , 2001, Springer Berlin Heidelberg.

[16] Francisco Herrera,et al. Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study , 2003, IEEE Trans. Evol. Comput..

[17] Carey E. Priebe,et al. Classification Using Class Cover Catch Digraphs , 2003, J. Classif..

[18] Tony R. Martinez,et al. Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[19] D. Ruppert. The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[20] Lenore Cowen,et al. Approximation Algorithms for the Class Cover Problem , 2004, Annals of Mathematics and Artificial Intelligence.

[21] Jason Weston,et al. Mismatch string kernels for discriminative protein classification , 2004, Bioinform..

[22] Zakria Hussain. The Linear Programming Set Covering Machine , 2004 .

[23] Hans C. van Houwelingen,et al. The Elements of Statistical Learning, Data Mining, Inference, and Prediction. Trevor Hastie, Robert Tibshirani and Jerome Friedman, Springer, New York, 2001. No. of pages: xvi+533. ISBN 0‐387‐95284‐5 , 2004 .

[24] Mee Young Park,et al. L 1-regularization path algorithm for generalized linear models , 2006 .

[25] Filiberto Pla,et al. Experimental study on prototype optimisation algorithms for prototype-based classification in vector spaces , 2006, Pattern Recognit..

[26] T. Ho,et al. Data Complexity in Pattern Recognition , 2006 .

[27] Ojas Parekh,et al. A Unified Approach to Approximating Partial Covering Problems , 2006, ESA.

[28] C. E. Priebe,et al. A new family of random graphs for testing spatial segregation , 2007 .

[29] M. V. Velzen,et al. Self-organizing maps , 2007 .

[30] Francisco Herrera,et al. Evolutionary stratified training set selection for extracting classification rules with trade off precision-interpretability , 2007, Data Knowl. Eng..

[31] Yang Jing. L1 Regularization Path Algorithm for Generalized Linear Models , 2008 .

[32] Atsuyoshi Nakamura,et al. Convex sets as prototypes for classifying patterns , 2009, Eng. Appl. Artif. Intell..

[33] Amir F. Atiya,et al. A Novel Template Reduction Approach for the $K$-Nearest Neighbor Method , 2009, IEEE Transactions on Neural Networks.

[34] Robert Tibshirani,et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[35] Trevor Hastie,et al. Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[36] Elena Marchiori,et al. Class Conditional Nearest Neighbor for Large Margin Instance Selection , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.