Dissimilarity Plots: A Visual Exploration Tool for Partitional Clustering

For hierarchical clustering, dendrograms are a convenient and powerful visualization technique. Although many visualization methods have been suggested for partitional clustering, their usefulness deteriorates quickly with increasing dimensionality of the data and/or they fail to represent structure between and within clusters simultaneously. In this article we extend (dissimilarity) matrix shading with several reordering steps based on seriation techniques. Both ideas, matrix shading and reordering, have been well known for a long time. However, only recent algorithmic improvements allow us to solve or approximately solve the seriation problem efficiently for larger problems. Furthermore, seriation techniques are used in a novel stepwise process (within each cluster and between clusters) which leads to a visualization technique that is able to present the structure between clusters and the micro-structure within clusters in one concise plot. This not only allows us to judge cluster quality but also makes misspecification of the number of clusters apparent. We give a detailed discussion of the construction of dissimilarity plots and demonstrate their usefulness with several examples. Experiments show that dissimilarity plots scale very well with increasing data dimensionality. Supplemental materials with additional experiments for this article are available online.

[1]  M. Friendly Mosaic Displays for Multi-Way Contingency Tables , 1994 .

[2]  Kurt Hornik,et al.  Escaping RGBland: Selecting colors for statistical graphics , 2009, Comput. Stat. Data Anal..

[3]  Michael Hahsler,et al.  Getting Things in Order: An Introduction to the R Package seriation , 2008 .

[4]  P. Rousseeuw,et al.  Displaying a clustering with CLUSPLOT , 1999 .

[5]  Phipps Arabie,et al.  Combinatorial Data Analysis: Optimization by Dynamic Programming , 1987 .

[6]  Phipps Arabie,et al.  AN OVERVIEW OF COMBINATORIAL DATA ANALYSIS , 1996 .

[7]  Chun-Houh Chen GENERALIZED ASSOCIATION PLOTS: INFORMATION VISUALIZATION VIA ITERATIVELY GENERATED CORRELATION MATRICES , 2002 .

[8]  W. S. Robinson A Method for Chronologically Ordering Archaeological Deposits , 1951, American Antiquity.

[9]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[10]  Joydeep Ghosh,et al.  Relationship-Based Clustering and Visualization for High-Dimensional Data Mining , 2003, INFORMS J. Comput..

[11]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[12]  P. Sneath,et al.  Numerical Taxonomy , 1962, Nature.

[13]  William C. Halperin,et al.  Unclassed matrix shading and optimal ordering in hierarchical cluster analysis , 1984 .

[14]  Tommi S. Jaakkola,et al.  Fast optimal leaf ordering for hierarchical clustering , 2001, ISMB.

[15]  Jacques Bertin,et al.  Graphics and graphic information-processing , 1981 .

[16]  M. Friendly Corrgrams , 2002 .

[17]  Abraham P. Punnen,et al.  The traveling salesman problem and its variations , 2007 .

[18]  J. Hartigan REPRESENTATION OF SIMILARITY MATRICES BY TREES , 1967 .

[19]  Robert F. Ling,et al.  A computer generated aid for cluster analysis , 1973, CACM.

[20]  Lawrence Hubert,et al.  SOME APPLICATIONS OF GRAPH THEORY AND RELATED NON‐METRIC TECHNIQUES TO PROBLEMS OF APPROXIMATE SERIATION: THE CASE OF SYMMETRIC PROXIMITY MEASURES , 1974 .

[21]  Catherine B. Hurley,et al.  Clustering Visualizations of Multidimensional Data , 2004 .

[22]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1991 .

[23]  F. Stephan,et al.  Set theory , 2018, Conversational Problem Solving.

[24]  H. Wainer,et al.  TWO ADDITIONS TO HIERARCHICAL CLUSTER ANALYSIS , 1972 .

[25]  Hans-Friedrich Köhn,et al.  Branch-and-bound applications in combinatorial data analysis , 2006, Psychometrika.

[26]  David S. Wishart,et al.  Clustan Graphics3 Interactive Graphics for Cluster Analysis , 1999 .

[27]  M. Brusco,et al.  Heuristic Implementation of Dynamic Programming for Matrix Permutation Problems in Combinatorial Data Analysis , 2008 .

[28]  Friedrich Leisch,et al.  Visualizing cluster analysis and finite mixture models , 2008 .

[29]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Brian W. Kernighan,et al.  An Effective Heuristic Algorithm for the Traveling-Salesman Problem , 1973, Oper. Res..

[31]  Gilles Caraux,et al.  PermutMatrix: a graphical environment to arrange gene expression profiles in optimal linear order , 2005, Bioinform..

[32]  Leland Wilkinson,et al.  The History of the Cluster Heat Map , 2009 .

[33]  Jacalyn M. Huband,et al.  bigVAT: Visual assessment of cluster tendency for large data sets , 2005, Pattern Recognit..

[34]  Enrique H. Ruspini,et al.  Numerical methods for fuzzy clustering , 1970, Inf. Sci..

[35]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .