Binarization With Boosting and Oversampling for Multiclass Classification

Using a set of binary classifiers to solve multiclass classification problems has been a popular approach over the years. The decision boundaries learnt by binary classifiers (also called base classifiers) are much simpler than those learnt by multiclass classifiers. This paper proposes a new classification framework, termed binarization with boosting and oversampling (BBO), for efficiently solving multiclass classification problems. The new framework is devised based on the one-versus-all (OVA) binarization technique. Unlike most previous work, BBO employs boosting for solving the hard-to-learn instances and oversampling for handling the class-imbalance problem arising due to OVA binarization. These two features make BBO different from other existing works. Our new framework has been tested extensively on several multiclass supervised and semi-supervised classification problems using five different base classifiers, including neural networks, C4.5, k-nearest neighbor, repeated incremental pruning to produce error reduction, support vector machine, random forest, and learning with local and global consistency. Experimental results show that BBO can exhibit better performance compared to its counterparts on supervised and semi-supervised classification problems.

[1]  Yansheng Lu,et al.  An adaptive multiclass boosting algorithm for classification , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[2]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[3]  Sung-Bae Cho,et al.  Fingerprint classification using one-vs-all support vector machines dynamically ordered with naive Bayes classifiers , 2008, Pattern Recognit..

[4]  Xuelong Li,et al.  Semisupervised Dimensionality Reduction and Classification Through Virtual Label Regression , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[5]  Ludmila I. Kuncheva,et al.  Switching between selection and fusion in combining classifiers: an experiment , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[6]  Eyke Hüllermeier,et al.  Combining predictions in pairwise classification: An optimal adaptive voting strategy and its relation to weighted voting , 2010, Pattern Recognit..

[7]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[8]  Eyke Hüllermeier,et al.  Pairwise Preference Learning and Ranking , 2003, ECML.

[9]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[10]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[11]  Peter W. Tse,et al.  A One-Versus-All Class Binarization Strategy for Bearing Diagnostics of Concurrent Defects , 2014, Sensors.

[12]  Maozu Guo,et al.  A new relational Tri-training system with adaptive data editing for inductive logic programming , 2012, Knowl. Based Syst..

[13]  Johannes Fürnkranz,et al.  Round Robin Classification , 2002, J. Mach. Learn. Res..

[14]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[15]  S. Chakraborty Bayesian semi-supervised learning with support vector machine , 2011 .

[16]  Wen Gao,et al.  Multiview Metric Learning with Global Consistency and Local Smoothness , 2012, TIST.

[17]  Teresa Bernarda Ludermir,et al.  Hybrid Training Method for MLP: Optimization of Architecture and Training , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[18]  Huanhuan Chen,et al.  Semisupervised Classification With Cluster Regularization , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[19]  Eyke Hüllermeier,et al.  FR3: A Fuzzy Rule Learner for Inducing Reliable Classifiers , 2009, IEEE Transactions on Fuzzy Systems.

[20]  Roberto Basili Book Review , 2003, Computational Linguistics.

[21]  Kishan G. Mehrotra,et al.  An improved algorithm for neural network classification of imbalanced training sets , 1993, IEEE Trans. Neural Networks.

[22]  Gérard Dreyfus,et al.  Single-layer learning revisited: a stepwise procedure for building and training a neural network , 1989, NATO Neurocomputing.

[23]  Johannes Fürnkranz,et al.  Round robin ensembles , 2003, Intell. Data Anal..

[24]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[25]  Francisco Herrera,et al.  Solving multi-class problems with linguistic fuzzy rule based classification systems based on pairwise learning and preference relations , 2010, Fuzzy Sets Syst..

[26]  Heng Zhang,et al.  Keyword Spotting from Online Chinese Handwritten Documents Using One-vs-All Trained Character Classifier , 2010, 2010 12th International Conference on Frontiers in Handwriting Recognition.

[27]  Thorsten Joachims,et al.  Transductive Learning via Spectral Graph Partitioning , 2003, ICML.

[28]  Kazuyuki Murase,et al.  An algorithmic framework based on the binarization approach for supervised and semi-supervised multiclass problems , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[29]  Eyke Hüllermeier,et al.  Learning valued preference structures for solving classification problems , 2008, Fuzzy Sets Syst..

[30]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[31]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[32]  Francisco Herrera,et al.  An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes , 2011, Pattern Recognit..

[33]  Yan-Qing Zhang,et al.  Robust multiclass classification for learning from imbalanced biomedical data , 2012 .

[34]  Florin Cutzu,et al.  Polychotomous Classification with Pairwise Classifiers: A New Voting Principle , 2003, Multiple Classifier Systems.

[35]  Xin Yao,et al.  Bagging and Boosting Negatively Correlated Neural Networks , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[36]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[37]  Roberto Basili,et al.  Learning to Classify Text Using Support Vector Machines: Methods, Theory, and Algorithms by Thorsten Joachims , 2003, Comput. Linguistics.

[38]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[39]  Xin Yao,et al.  MWMOTE--Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning , 2014 .

[40]  Theofanis Sapatinas,et al.  Discriminant Analysis and Statistical Pattern Recognition , 2005 .

[41]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[42]  Chun-Gui Xu,et al.  A genetic programming-based approach to the classification of multiclass microarray datasets , 2009, Bioinform..

[43]  Xiaojin Zhu,et al.  Semi-Supervised Learning Literature Survey , 2005 .

[44]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[45]  Remo Guidieri Res , 1995, RES: Anthropology and Aesthetics.

[46]  Zhi-Hua Zhou,et al.  Tri-training: exploiting unlabeled data using three classifiers , 2005, IEEE Transactions on Knowledge and Data Engineering.

[47]  Andrés Pérez-Uribe,et al.  Indoor Activity Recognition by Combining One-vs.-All Neural Network Classifiers Exploiting Wearable and Depth Sensors , 2013, IWANN.

[48]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[49]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[50]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.