论文信息 - Hierarchical Clustering With Prototypes via Minimax Linkage

Hierarchical Clustering With Prototypes via Minimax Linkage

Agglomerative hierarchical clustering is a popular class of methods for understanding the structure of a dataset. The nature of the clustering depends on the choice of linkage—that is, on how one measures the distance between clusters. In this article we investigate minimax linkage, a recently introduced but little-studied linkage. Minimax linkage is unique in naturally associating a prototype chosen from the original dataset with every interior node of the dendrogram. These prototypes can be used to greatly enhance the interpretability of a hierarchical clustering. Furthermore, we prove that minimax linkage has a number of desirable theoretical properties; for example, minimax-linkage dendrograms cannot have inversions (unlike centroid linkage) and is robust against certain perturbations of a dataset. We provide an efficient implementation and illustrate minimax linkage’s strengths as a data analysis and visualization tool on a study of words from encyclopedia articles and on a dataset of images of human faces.

Robert Tibshirani | Jacob Bien | R. Tibshirani | J. Bien

[1] Sanjoy Dasgupta,et al. Adaptive Control Processes , 2010, Encyclopedia of Machine Learning and Data Mining.

[2] Allen Gersho,et al. Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[3] Trevor Hastie,et al. The Elements of Statistical Learning , 2001 .

[4] J. V. Ness,et al. Admissible clustering procedures , 1971 .

[5] E. Lander,et al. Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[6] G. N. Lance,et al. A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems , 1967, Comput. J..

[7] Robert R. Sokal,et al. A statistical method for evaluating systematic relationships , 1958 .

[8] U. Alon,et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[9] Robert Tibshirani,et al. A Framework for Feature Selection in Clustering , 2010, Journal of the American Statistical Association.

[10] Robert Tibshirani,et al. Hybrid hierarchical clustering with applications to microarray data. , 2005, Biostatistics.

[11] Vijay V. Vazirani,et al. Approximation Algorithms , 2001, Springer Berlin Heidelberg.

[12] Robert Tibshirani,et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[13] Toshio Odanaka,et al. ADAPTIVE CONTROL PROCESSES , 1990 .

[14] D. Botstein,et al. Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[15] David B. Shmoys,et al. A Best Possible Heuristic for the k-Center Problem , 1985, Math. Oper. Res..

[16] P. Sopp. Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[17] A. D. Gordon. A Review of Hierarchical Classification , 1987 .

[18] Bernhard Schölkopf,et al. A Kernel Approach for Vector Quantization with Guaranteed Distortion Bounds , 2001, AISTATS.

[19] Roberto Bellotti. Hausdorff Clustering , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20] D. Ruppert. The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[21] Sio Iong Ao,et al. CLUSTAG: hierarchical clustering and graph methods for selecting tag SNPs , 2005, Bioinform..

[22] Fionn Murtagh,et al. A Survey of Recent Advances in Hierarchical Clustering Algorithms , 1983, Comput. J..