On semi-supervised clustering via multiobjective optimization

Semi-supervised classification uses aspects of both unsupervised and supervised learning to improve upon the performance of traditional classification methods. Semi-supervised clustering, in particular, explicitly integrates both information about the data distribution and about class memberships into the clustering process. In this paper, the potential of a multiobjective formulation of the semi-supervised clustering problem is explored, and two evolutionary multiobjective approaches to the problem are outlined. Experimental results demonstrate practical performance benefits of this methodology, including an improved classification performance and an increased robustness towards annotation errors.

[1]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[2]  Jason Weston,et al.  Semi-supervised Protein Classification Using Cluster Kernels , 2003, NIPS.

[3]  R. Tibshirani,et al.  Semi-Supervised Methods to Predict Patient Survival from Gene Expression Data , 2004, PLoS biology.

[4]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[5]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[6]  Ayhan Demiriz,et al.  A Genetic Algorithm Approach for Semi-Supervised Clustering , 2002 .

[7]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[8]  Martin J. Oates,et al.  PESA-II: region-based selection in evolutionary multiobjective optimization , 2001 .

[9]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.

[10]  Shenghuo Zhu,et al.  Gene functional classification by semi-supervised learning from heterogeneous data , 2003, SAC '03.

[11]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[12]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[13]  Daniel Hanisch,et al.  Co-clustering of biological networks and gene expression data , 2002, ISMB.

[14]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[15]  A. D. Gordon A survey of constrained classification , 1996 .

[16]  Joshua D. Knowles,et al.  Improvements to the scalability of multiobjective clustering , 2005, 2005 IEEE Congress on Evolutionary Computation.

[17]  Andreas Zell,et al.  A memetic co-clustering algorithm for gene expression profiles and biological annotation , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[18]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[19]  Tobias Scheffer,et al.  Effectiveness of information extraction, multi-relational, and semi-supervised learning for predicting functional properties of genes , 2003, Third IEEE International Conference on Data Mining.