Progressive subspace ensemble learning

There are not many classifier ensemble approaches which investigate the data sample space and the feature space at the same time, and this multi-pronged approach will be helpful for constructing more powerful learning models. For example, the AdaBoost approach only investigates the data sample space, while the random subspace technique only focuses on the feature space. To address this limitation, we propose the progressive subspace ensemble learning approach (PSEL) which takes into account the data sample space and the feature space at the same time. Specifically, PSEL first adopts the random subspace technique to generate a set of subspaces. Then, a progressive selection process based on new cost functions that incorporate current and long-term information to select the classifiers sequentially will be introduced. Finally, a weighted voting scheme is used to summarize the predicted labels and obtain the final result. We also adopt a number of non-parametric tests to compare PSEL and its competitors over multiple datasets. The results of the experiments show that PSEL works well on most of the real datasets, and outperforms a number of state-of-the-art classifier ensemble approaches. HighlightsProgressive subspace ensemble learning.It takes into account the data sample space and the feature space at the same time.A progressive selection process based on new cost functions that incorporate current and long-term information to select the classifiers sequentially will be introduced.

[1]  Jane You,et al.  From cluster ensemble to structure ensemble , 2012, Inf. Sci..

[2]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[3]  Robert Sabourin,et al.  An adaptive ensemble of fuzzy ARTMAP neural networks for video-based face classification , 2010, IEEE Congress on Evolutionary Computation.

[4]  Juan José Rodríguez Diez,et al.  Random Subspace Ensembles for fMRI Classification , 2010, IEEE Transactions on Medical Imaging.

[5]  Daniel Hernández-Lobato,et al.  An Analysis of Ensemble Pruning Techniques Based on Ordered Aggregation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Zhiwen Yu,et al.  Class Discovery From Gene Expression Data Based on Perturbation and Cluster Ensemble , 2009, IEEE Transactions on NanoBioscience.

[7]  Zhiwen Yu,et al.  Graph-based consensus clustering for class discovery from gene expression data , 2007, Bioinform..

[8]  Witold Pedrycz,et al.  A Study on Relationship Between Generalization Abilities and Fuzziness of Base Classifiers in Ensemble Learning , 2015, IEEE Transactions on Fuzzy Systems.

[9]  Qinghua Hu,et al.  EROS: Ensemble rough subspaces , 2007, Pattern Recognit..

[10]  Ludmila I. Kuncheva,et al.  "Fuzzy" versus "nonfuzzy" in combining classifiers designed by Boosting , 2003, IEEE Trans. Fuzzy Syst..

[11]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[12]  Francis K. H. Quek,et al.  Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets , 2003, Pattern Recognit..

[13]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[14]  Ponnuthurai N. Suganthan,et al.  Oblique Decision Tree Ensemble via Multisurface Proximal Support Vector Machine , 2015, IEEE Transactions on Cybernetics.

[15]  Jane You,et al.  Hybrid Fuzzy Cluster Ensemble Framework for Tumor Clustering from Biomolecular Data , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[16]  Eyke Hüllermeier,et al.  Combining predictions in pairwise classification: An optimal adaptive voting strategy and its relation to weighted voting , 2010, Pattern Recognit..

[17]  Fang Liu,et al.  A novel dynamic rough subspace based selective ensemble , 2015, Pattern Recognit..

[18]  Jane You,et al.  SC³: Triple Spectral Clustering-Based Consensus Clustering Framework for Class Discovery from Cancer Gene Expression Profiles , 2012, TCBB.

[19]  ZhouZhi-Hua,et al.  Ensembling neural networks , 2002 .

[20]  Geoffrey I. Webb,et al.  MultiBoosting: A Technique for Combining Boosting and Wagging , 2000, Machine Learning.

[21]  Fabio Roli,et al.  Intrusion detection in computer networks by a modular ensemble of one-class classifiers , 2008, Inf. Fusion.

[22]  Jane You,et al.  Semi-supervised classification based on random subspace dimensionality reduction , 2012, Pattern Recognit..

[23]  Lei Liu,et al.  Ensemble gene selection for cancer classification , 2010, Pattern Recognit..

[24]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[25]  David Vázquez,et al.  Occlusion Handling via Random Subspace Classifiers for Human Detection , 2014, IEEE Transactions on Cybernetics.

[26]  Gavin Brown,et al.  Learn++.MF: A random subspace approach for the missing feature problem , 2010, Pattern Recognit..

[27]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Horst Bunke,et al.  Feature selection algorithms for the generation of multiple classifier systems and their application to handwritten word recognition , 2004 .

[29]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Mohamed S. Kamel,et al.  Integrating Heterogeneous Classifier Ensembles for EMG Signal Decomposition Based on Classifier Agreement , 2010, IEEE Transactions on Information Technology in Biomedicine.

[31]  Yongsheng Ding,et al.  An ensemble classifier based prediction of G-protein-coupled receptor classes in low homology , 2015, Neurocomputing.

[32]  Han Guoqiang,et al.  SC(3): Triple spectral clustering-based consensus clustering framework for class discovery from cancer gene expression profiles. , 2012, IEEE/ACM transactions on computational biology and bioinformatics.

[33]  P. N. Suganthan,et al.  A comprehensive evaluation of random vector functional link networks , 2016, Inf. Sci..

[34]  Xuelong Li,et al.  Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Mohammad Reza Daliri,et al.  Predicting the Cognitive States of the Subjects in Functional Magnetic Resonance Imaging Signals Using the Combination of Feature Selection Strategies , 2011, Brain Topography.

[36]  H WittenIan,et al.  The WEKA data mining software , 2009 .

[37]  Ludmila I. Kuncheva,et al.  A Bound on Kappa-Error Diagrams for Analysis of Classifier Ensembles , 2013, IEEE Transactions on Knowledge and Data Engineering.

[38]  Zhiwen Yu,et al.  Knowledge Based Cluster Ensemble for Cancer Discovery From Biomolecular Data , 2011, IEEE Transactions on NanoBioscience.

[39]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[40]  M.F. Saeedian,et al.  Spam Detection Using Dynamic Weighted Voting Based on Clustering , 2008, 2008 Second International Symposium on Intelligent Information Technology Application.

[41]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[42]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[43]  Hareton K. N. Leung,et al.  Incremental Semi-Supervised Clustering Ensemble for High Dimensional Data Clustering , 2016, IEEE Transactions on Knowledge and Data Engineering.

[44]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[45]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[46]  L. Kuncheva ‘ Fuzzy ’ vs ‘ Non-fuzzy ’ in Combining Classifiers Designed by Boosting , 2003 .

[47]  Hareton K. N. Leung,et al.  Hybrid $k$ -Nearest Neighbor Classifier , 2016, IEEE Transactions on Cybernetics.

[48]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[49]  Mohammad Reza Daliri Feature selection using binary particle swarm optimization and support vector machines for medical diagnosis , 2012, Biomedizinische Technik. Biomedical engineering.

[50]  Zhiwen Yu,et al.  Adaptive noise immune cluster ensemble using affinity propagation , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[51]  Robert Sabourin,et al.  A dynamic overproduce-and-choose strategy for the selection of classifier ensembles , 2008, Pattern Recognit..

[52]  Zhiwen Yu,et al.  Hybrid Adaptive Classifier Ensemble , 2015, IEEE Transactions on Cybernetics.

[53]  Juan José Rodríguez Diez,et al.  Classifier Ensembles with a Random Linear Oracle , 2007, IEEE Transactions on Knowledge and Data Engineering.

[54]  Liying Yang,et al.  Classifiers selection for ensemble learning based on accuracy and diversity , 2011 .

[55]  Yunde Jia,et al.  A linear discriminant analysis framework based on random subspace for face recognition , 2007, Pattern Recognit..

[56]  Alexander Schliep,et al.  Clustering cancer gene expression data: a comparative study , 2008, BMC Bioinformatics.

[57]  Loris Nanni,et al.  Toward a General-Purpose Heterogeneous Ensemble for Pattern Classification , 2015, Comput. Intell. Neurosci..

[58]  Wei Tang,et al.  Ensembling neural networks: Many could be better than all , 2002, Artif. Intell..

[59]  Safdar Ali,et al.  Can-Evo-Ens: Classifier stacking based evolutionary ensemble system for prediction of human breast cancer using amino acid sequences , 2015, J. Biomed. Informatics.

[60]  HerreraFrancisco,et al.  A Review on Ensembles for the Class Imbalance Problem , 2012 .

[61]  Mohammad Reza Daliri,et al.  Combining extreme learning machines using support vector machines for breast tissue classification , 2015, Computer methods in biomechanics and biomedical engineering.

[62]  H. J. Mclaughlin,et al.  Learn , 2002 .

[63]  Ajith Abraham,et al.  A novel multiplex cascade classifier for pedestrian detection , 2013, Pattern Recognit. Lett..

[64]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[65]  Rui Liu,et al.  Chinese Text Classification Based on the BVB Model , 2008, 2008 Fourth International Conference on Semantics, Knowledge and Grid.

[66]  Shiliang Sun,et al.  The Selective Random Subspace Predictor for Traffic Flow Forecasting , 2007, IEEE Transactions on Intelligent Transportation Systems.

[67]  George D. C. Cavalcanti,et al.  META-DES: A dynamic ensemble selection framework using meta-learning , 2015, Pattern Recognit..

[68]  Chun-Xia Zhang,et al.  RotBoost: A technique for combining Rotation Forest and AdaBoost , 2008, Pattern Recognit. Lett..

[69]  Loris Nanni,et al.  Double committee adaboost , 2013 .

[70]  Xiaoyi Jiang,et al.  A dynamic classifier ensemble selection approach for noise data , 2010, Inf. Sci..

[71]  Lie Guo,et al.  Pedestrian detection for intelligent transportation systems combining AdaBoost algorithm and support vector machine , 2012, Expert Syst. Appl..

[72]  Ponnuthurai N. Suganthan,et al.  Random Forests with ensemble of feature spaces , 2014, Pattern Recognit..

[73]  Jane You,et al.  Double Selection Based Semi-Supervised Clustering Ensemble for Tumor Clustering from Gene Expression Profiles , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[74]  Daoqiang Zhang,et al.  A novel ensemble construction method for multi-view data using random cross-view correlation between within-class examples , 2011, Pattern Recognit..

[75]  Yunming Ye,et al.  Stratified sampling for feature subspace selection in random forests for high dimensional data , 2013, Pattern Recognit..

[76]  Ludmila I. Kuncheva,et al.  Naive random subspace ensemble with linear classifiers for real-time classification of fMRI data , 2012, Pattern Recognit..

[77]  Zhi-Hua Zhou,et al.  NeC4.5: Neural Ensemble Based C4.5 , 2004, IEEE Trans. Knowl. Data Eng..

[78]  Mohammad Reza Daliri,et al.  Supervised segmentation of MRI brain images using combination of multiple classifiers , 2015, Australasian Physical & Engineering Sciences in Medicine.

[79]  Francisco Herrera,et al.  EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling , 2013, Pattern Recognit..

[80]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.