Oblique Decision Tree Ensemble via Multisurface Proximal Support Vector Machine

A new approach to generate oblique decision tree ensemble is proposed wherein each decision hyperplane in the internal node of tree classifier is not always orthogonal to a feature axis. All training samples in each internal node are grouped into two hyper-classes according to their geometric properties based on a randomly selected feature subset. Then multisurface proximal support vector machine is employed to obtain two clustering hyperplanes where each hyperplane is generated such that it is closest to one group of the data and as far as possible from the other group. Then, one of the bisectors of these two hyperplanes is regarded as the test hyperplane for this internal node. Several regularization methods have been applied to handle the small sample size problem as the tree grows. The effectiveness of the proposed method is demonstrated by 44 real-world benchmark classification data sets from various research fields. These classification results show the advantage of the proposed approach in both computation time and classification accuracy.

[1]  O. Mangasarian,et al.  NONLINEAR PERTURBATION OF LINEAR PROGRAMS , 1979 .

[2]  Tomaso Poggio,et al.  Probabilistic Solution of Ill-Posed Problems in Computational Vision , 1987 .

[3]  Gérard Dreyfus,et al.  Single-layer learning revisited: a stepwise procedure for building and training a neural network , 1989, NATO Neurocomputing.

[4]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[5]  Isabelle Guyon,et al.  Comparison of classifier methods: a case study in handwritten digit recognition , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[6]  Thomas G. Dietterich,et al.  Error-Correcting Output Coding Corrects Bias and Variance , 1995, ICML.

[7]  R. E. Lee,et al.  Distribution-free multiple comparisons between successive treatments , 1995 .

[8]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[9]  Steven L. Salzberg,et al.  On growing better decision trees from data , 1996 .

[10]  Leo Breiman,et al.  Bias, Variance , And Arcing Classifiers , 1996 .

[11]  Ron Kohavi,et al.  Bias Plus Variance Decomposition for Zero-One Loss Functions , 1996, ICML.

[12]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[14]  Thomas G. Dietterich Ensemble Methods in Machine Learning , 2000, Multiple Classifier Systems.

[15]  Ja-Chen Lin,et al.  A new LDA-based face recognition system which can solve the small sample size problem , 1998, Pattern Recognit..

[16]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[17]  James S. Goerss,et al.  Tropical Cyclone Track Forecasts Using an Ensemble of Dynamical Models , 2000 .

[18]  Chandrika Kamath,et al.  Inducing oblique decision trees with evolutionary algorithms , 2003, IEEE Trans. Evol. Comput..

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[20]  Jerome H. Friedman,et al.  On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality , 2004, Data Mining and Knowledge Discovery.

[21]  Gareth James,et al.  Variance and Bias for General Loss Functions , 2003, Machine Learning.

[22]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[23]  Jun Chen,et al.  Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes , 2004, BMC Bioinformatics.

[24]  Yuxiao Hu,et al.  Face recognition using Laplacianfaces , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Kurt Hornik,et al.  The Design and Analysis of Benchmark Experiments , 2005 .

[26]  Lior Rokach,et al.  Top-down induction of decision trees classifiers - a survey , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[27]  Witold Pedrycz,et al.  Genetically optimized fuzzy decision trees , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[28]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[29]  Yi Lin,et al.  Random Forests and Adaptive Nearest Neighbors , 2006 .

[30]  Jiawei Han,et al.  Orthogonal Laplacianfaces for Face Recognition , 2006, IEEE Transactions on Image Processing.

[31]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[33]  Olvi L. Mangasarian,et al.  Multisurface proximal support vector machine classification via generalized eigenvalues , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Bjoern H Menze,et al.  Multivariate feature selection and hierarchical classification for infrared spectroscopy: serum-based detection of bovine spongiform encephalopathy , 2007, Analytical and bioanalytical chemistry.

[35]  Peter Kokol,et al.  Effectiveness of Rotation Forest in Meta-learning Based Gene Expression Classification , 2007, Twentieth IEEE International Symposium on Computer-Based Medical Systems (CBMS'07).

[36]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[37]  Chong Jin Ong,et al.  A Feature Selection Method for Multilevel Mental Fatigue EEG Classification , 2007, IEEE Transactions on Biomedical Engineering.

[38]  Juan José Rodríguez Diez,et al.  Classifier Ensembles with a Random Linear Oracle , 2007, IEEE Transactions on Knowledge and Data Engineering.

[39]  Lawrence O. Hall,et al.  A Comparison of Decision Tree Ensemble Creation Techniques , 2007 .

[40]  Li Zhang,et al.  Decision Tree Support Vector Machine , 2007, Int. J. Artif. Intell. Tools.

[41]  Xudong Jiang,et al.  Eigenfeature Regularization and Extraction in Face Recognition , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Marco Wiering,et al.  Ensemble Algorithms in Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[43]  Chun-Xia Zhang,et al.  RotBoost: A technique for combining Rotation Forest and AdaBoost , 2008, Pattern Recognit. Lett..

[44]  Xizhao Wang,et al.  Induction of multiple fuzzy decision trees based on rough set technique , 2008, Inf. Sci..

[45]  De-Shuang Huang,et al.  Cancer classification using Rotation Forest , 2008, Comput. Biol. Medicine.

[46]  C. Tappert,et al.  A Genetic Algorithm for Constructing Compact Binary Decision Trees , 2009 .

[47]  Bjoern H. Menze,et al.  A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data , 2009, BMC Bioinformatics.

[48]  Xi-Zhao Wang,et al.  Improving Generalization of Fuzzy IF--THEN Rules by Maximizing Fuzzy Entropy , 2009, IEEE Transactions on Fuzzy Systems.

[49]  Thomas Martinetz,et al.  BLProt: prediction of bioluminescent proteins based on support vector machine and relieff feature selection , 2011, BMC Bioinformatics.

[50]  Kyungsook Han,et al.  Sequence-based prediction of protein-protein interactions by means of rotation forest and autocorrelation descriptor. , 2010, Protein and peptide letters.

[51]  Xudong Jiang,et al.  Linear Subspace Learning-Based Dimensionality Reduction , 2011, IEEE Signal Processing Magazine.

[52]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[53]  Gerrit K. Janssens,et al.  Pareto-optimality of oblique decision trees from evolutionary algorithms , 2011, J. Glob. Optim..

[54]  Akin Özçift,et al.  SVM Feature Selection Based Rotation Forest Ensemble Classifiers to Improve Computer-Aided Diagnosis of Parkinson Disease , 2011, Journal of Medical Systems.

[55]  P. Suganthan,et al.  AFP-Pred: A random forest approach for predicting antifreeze proteins from sequence-derived properties. , 2011, Journal of theoretical biology.

[56]  Naresh Manwani,et al.  Geometric Decision Tree , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[57]  Moni Naor,et al.  Multiple Classifier Systems , 2013, Lecture Notes in Computer Science.

[58]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[59]  Ponnuthurai N. Suganthan,et al.  Random Forests with ensemble of feature spaces , 2014, Pattern Recognit..

[60]  Ponnuthurai N. Suganthan,et al.  Towards generating random forests via extremely randomized trees , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[61]  Jon Atli Benediktsson,et al.  Multiple Classifier Systems , 2015, Lecture Notes in Computer Science.