A voting-merging clustering algorithm

In this paper we propose an unsupervised voting-merging scheme that is capable of clustering data sets, and also of finding the number of clusters existing in them. The voting part of the algorithm allows us to combine several runs of clustering algorithms resulting in a common partition. This helps us to overcome instabilities of the clustering algorithms and to improve the ability to find structures in a data set. Moreover, we develop a strategy to understand, analyze and interpret these results. In the second part of the scheme, a merging procedure starts on the clusters resulting by voting, in order to find the number of clusters in the data set.

[1]  Geoffrey H. Ball,et al.  ISODATA, A NOVEL METHOD OF DATA ANALYSIS AND PATTERN CLASSIFICATION , 1965 .

[2]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[3]  G. W. Milligan,et al.  A monte carlo study of thirty internal criterion measures for cluster analysis , 1981 .

[4]  V. E. Kane,et al.  Estimating the number of groups and group membership using simulation cluster analysis , 1982, Pattern Recognit..

[5]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[6]  Nippon Telegraph,et al.  Finding natural clusters having minimum description length , 1990 .

[7]  Donald E. Brown,et al.  Clustering of homogeneous subsets , 1991, Pattern Recognit. Lett..

[8]  C. A. Murthy,et al.  A new split-and-merge clustering technique , 1992, Pattern Recognit. Lett..

[9]  Erkki Oja,et al.  Rival penalized competitive learning for clustering analysis, RBF net, and curve detection , 1993, IEEE Trans. Neural Networks.

[10]  Thomas Martinetz,et al.  'Neural-gas' network for vector quantization and its application to time-series prediction , 1993, IEEE Trans. Neural Networks.

[11]  Ling-Hwei Chen,et al.  A new non-iterative approach for clustering , 1994, Pattern Recognit. Lett..

[12]  Ashok N. Srivastava,et al.  Nonlinear gated experts for time series: discovering regimes and avoiding overfitting , 1995, Int. J. Neural Syst..

[13]  Anil K. Jain,et al.  Artificial neural networks for feature extraction and multivariate data projection , 1995, IEEE Trans. Neural Networks.

[14]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[15]  R.J.P. deFigueiredo The OI, OS, OMNI, and OSMAN networks as best approximations of nonlinear systems under training data constraints , 1996, 1996 IEEE International Symposium on Circuits and Systems. Circuits and Systems Connecting the World. ISCAS 96.

[16]  Paolo Gamba,et al.  Automatic selection of the number of clusters in multidimensional data problems , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[17]  Lei Xu,et al.  Bayesian Ying-Yang machine, clustering and number of clusters , 1997, Pattern Recognit. Lett..

[18]  R.J.P. DeFigueiredo,et al.  A self-organizing neural network for cluster detection and labeling , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[19]  Christian Buchta,et al.  A comparison of several cluster algorithms on artificial binary data [Part 1]. Scenarios from travel market segmentation [Part 2: Working Paper 19]. , 1998 .

[20]  G. Celeux,et al.  Assessing a Mixture Model for Clustering with the Integrated Classification Likelihood , 1998 .

[21]  K. Hornik,et al.  Voting in clustering and finding the number of clusters , 1999 .

[22]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.