Semisupervised Classification With Cluster Regularization

Semisupervised classification (SSC) learns, from cheap unlabeled data and labeled data, to predict the labels of test instances. In order to make use of the information from unlabeled data, there should be an assumed relationship between the true class structure and the data distribution. One assumption is that data points clustered together are likely to have the same class label. In this paper, we propose a new algorithm, namely, cluster-based regularization (ClusterReg) for SSC, that takes the partition given by a clustering algorithm as a regularization term in the loss function of an SSC classifier. ClusterReg makes predictions according to the cluster structure together with limited labeled data. The experiments confirmed that ClusterReg has a good generalization ability for real-world problems. Its performance is excellent when data follows this cluster assumption. Even when these clusters have misleading overlaps, it still outperforms other state-of-the-art algorithms.

[1]  S. Chakraborty Bayesian semi-supervised learning with support vector machine , 2011 .

[2]  Yi Liu,et al.  SemiBoost: Boosting for Semi-Supervised Learning , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Lei Zheng,et al.  Information theoretic regularization for semi-supervised boosting , 2009, KDD.

[4]  Rong Jin,et al.  Semi-Supervised Boosting for Multi-Class Classification , 2008, ECML/PKDD.

[5]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[6]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[7]  Thorsten Joachims,et al.  Transductive Learning via Spectral Graph Partitioning , 2003, ICML.

[8]  Günther Palm,et al.  Semi-supervised learning for tree-structured ensembles of RBF networks with Co-Training , 2010, Neural Networks.

[9]  Ke Chen,et al.  Semi-Supervised Learning via Regularized Boosting Working on Multiple Semi-Supervised Assumptions , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Douglas Kline,et al.  Revisiting squared-error and cross-entropy functions for training neural network classifiers , 2005, Neural Computing & Applications.

[11]  Huanhuan Chen,et al.  A Probabilistic Ensemble Pruning Algorithm , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[12]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[13]  Zhi-Hua Zhou,et al.  New Semi-Supervised Classification Method Based on Modified Cluster Assumption , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[14]  Huanhuan Chen,et al.  Predictive Ensemble Pruning by Expectation Propagation , 2009, IEEE Transactions on Knowledge and Data Engineering.

[15]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[16]  Xiaojin Zhu,et al.  Semi-Supervised Learning Literature Survey , 2005 .

[17]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[18]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[19]  Zhi-Hua Zhou,et al.  Tri-training: exploiting unlabeled data using three classifiers , 2005, IEEE Transactions on Knowledge and Data Engineering.

[20]  Huanhuan Chen,et al.  Probabilistic Classification Vector Machines , 2009, IEEE Transactions on Neural Networks.

[21]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[22]  Donald Gustafson,et al.  Fuzzy clustering with a fuzzy covariance matrix , 1978, 1978 IEEE Conference on Decision and Control including the 17th Symposium on Adaptive Processes.