Hard and Soft Euclidean Consensus Partitions

Euclidean partition dissimilarity d(P, P∼) (Dimitriadou et al., 2002) is defined as the square root of the minimal sum of squared differences of the class membership values of the partitions P and P∼, with the minimum taken over all matchings between the classes of the partitions. We first discuss some theoretical properties of this dissimilarity measure. Then, we look at the Euclidean consensus problem for partition ensembles, i.e., the problem to find a hard or soft partition P with a given number of classes which minimizes the (possibly weighted) sum Σb w b d (P b ,P)2 of squared Euclidean dissimilarities d between P and the elements P b , of the ensemble. This is an NP-hard problem, and related to consensus problems studied in Gordon and Vichi (2001). We present an efficient “Alternating Optimization” (AO) heuristic for finding P, which iterates between optimally rematching classes for fixed memberships, and optimizing class memberships for fixed matchings. An implementation of such AO algorithms for consensus partitions is available in the R extension package clue. We illustrate this algorithm on two data sets (the popular Rosenberg-Kim kinship terms data and a macroeconomic one) employed by Gordon & Vichi.

[1]  Roger N. Shepard,et al.  Multidimensional scaling : theory and applications in the behavioral sciences , 1974 .

[2]  Kurt Hornik,et al.  A CLUE for CLUster Ensembles , 2005 .

[3]  Moonja P. Kim,et al.  The Method of Sorting as a Data-Gathering Procedure in Multivariate Research. , 1975, Multivariate behavioral research.

[4]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[5]  J. Rubin Optimal classification into groups: an approach for solving the taxonomy problem. , 1967, Journal of theoretical biology.

[6]  Bernard Monjardet,et al.  The median procedure in cluster analysis and social choice theory , 1981, Math. Soc. Sci..

[7]  Lloyd G. Humphreys,et al.  Multivariate Applications in the Social Sciences , 1982 .

[8]  Hongyuan Zha,et al.  A new Mallows distance based metric for comparing clusterings , 2005, ICML '05.

[9]  Hans-Hermann Bock,et al.  Classification and Related Methods of Data Analysis , 1988 .

[10]  Alain Guénoche,et al.  Maximum Transfer Distance Between Partitions , 2006, J. Classif..

[11]  Maurizio Vichi,et al.  Fuzzy partition models for fitting a set of partitions , 2001 .

[12]  A. D. Gordon,et al.  Partitions of Partitions , 1998 .

[13]  Kurt Hornik,et al.  A Combination Scheme for Fuzzy Clustering , 2002, AFSS.

[14]  Dan Gusfield,et al.  Partition-distance: A problem and class of perfect graphs arising in clustering , 2002, Inf. Process. Lett..

[15]  Martin Schader,et al.  Clusterwise aggregation of relations , 1988 .

[16]  William H. E. Day,et al.  Extremes in the Complexity of Computing Metric Distances Between Partitions , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.