Efficient Cluster-Based Boosting for Semisupervised Classification

Semisupervised classification (SSC) consists of using both labeled and unlabeled data to classify unseen instances. Due to the large number of unlabeled data typically available, SSC algorithms must be able to handle large-scale data sets. Recently, various ensemble algorithms have been introduced with improved generalization performance when compared to single classifiers. However, existing ensemble methods are not able to handle typical large-scale data sets. We propose efficient cluster-based boosting (ECB), a multiclass SSC algorithm with cluster-based regularization that avoids generating decision boundaries in high-density regions. A semisupervised selection procedure reduces time and space complexities by selecting only the most informative unlabeled instances for the training of each base learner. We provide evidences to demonstrate that ECB is able to achieve good performance with small amounts of selected data and a relatively small number of base learners. Our experiments confirmed that ECB scales to large data sets while delivering comparable generalization to state-of-the-art methods.

[1]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[2]  Xin Yao,et al.  Sparse Approximation Through Boosting for Learning Large Scale Kernel Machines , 2010, IEEE Transactions on Neural Networks.

[3]  C. Leistner,et al.  Regularized multi-class semi-supervised boosting , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Tommi S. Jaakkola,et al.  Partially labeled classification with Markov random walks , 2001, NIPS.

[5]  Chih-Cheng Hung,et al.  Semi-supervised multi-class Adaboost by exploiting unlabeled data , 2011, Expert Syst. Appl..

[6]  Jane You,et al.  Semi-supervised ensemble classification in subspaces , 2012, Appl. Soft Comput..

[7]  Gideon S. Mann,et al.  Simple, robust, scalable semi-supervised learning via expectation regularization , 2007, ICML '07.

[8]  Xinlei Chen,et al.  Large Scale Spectral Clustering with Landmark-Based Representation , 2011, AAAI.

[9]  Pravin M. Vaidya,et al.  AnO(n logn) algorithm for the all-nearest-neighbors Problem , 1989, Discret. Comput. Geom..

[10]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[11]  Xiaojin Zhu,et al.  Harmonic mixtures: combining mixture models and graph-based methods for inductive and scalable semi-supervised learning , 2005, ICML.

[12]  I. Nabney Efficient training of RBF networks for classification , 1999 .

[13]  Nicolas Le Roux,et al.  Large-Scale Algorithms , 2006, Semi-Supervised Learning.

[14]  Ke Chen,et al.  Semi-Supervised Learning via Regularized Boosting Working on Multiple Semi-Supervised Assumptions , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[16]  Hussein A. Abbass,et al.  A novel mixture of experts model based on cooperative coevolution , 2006, Neurocomputing.

[17]  Yi Liu,et al.  SemiBoost: Boosting for Semi-Supervised Learning , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Rong Jin,et al.  Semi-Supervised Boosting for Multi-Class Classification , 2008, ECML/PKDD.

[19]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[20]  Andy J. Keane,et al.  Some Greedy Learning Algorithms for Sparse Regression and Classification with Mercer Kernels , 2003, J. Mach. Learn. Res..

[21]  S. Sathiya Keerthi,et al.  Large scale semi-supervised linear SVMs , 2006, SIGIR.

[22]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[23]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[24]  Huanhuan Chen,et al.  Semisupervised Classification With Cluster Regularization , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[25]  Thorsten Joachims,et al.  Transductive Learning via Spectral Graph Partitioning , 2003, ICML.

[26]  Shie Mannor,et al.  The kernel recursive least-squares algorithm , 2004, IEEE Transactions on Signal Processing.

[27]  Xiaojin Zhu,et al.  Semi-Supervised Learning Literature Survey , 2005 .

[28]  Huanhuan Chen,et al.  A Cluster-Based Semisupervised Ensemble for Multiclass Classification , 2017, IEEE Transactions on Emerging Topics in Computational Intelligence.

[29]  Rodrigo G. F. Soares,et al.  Cluster-based semi-supervised ensemble learning , 2014 .