A discrete bacterial algorithm for feature selection in classification of microarray gene expression cancer data

When mining in high dimensional data, the curse of dimensionality is one of the major difficulty to overcome. In this paper, a weighted feature selection strategy is developed and embedded in bacterial based algorithms to reduce the feature dimension in classification. The proposed weighted feature selection strategy distinguishes the features by their classification performances as well as the occurrence frequency in population according to the two matrices. The objectives of minimizing the number of features, maximizing the performance, and minimizing the computational cost are all considered. Regarding the drawback of bacterial based algorithms, Bacterial Colony Optimization based feature selection algorithm is proposed to decrease the computational complexity as well as improve the search ability even in discrete optimization problems. To test the effectiveness of the proposed feature selection method, four bacterial based methods with the weighted strategy embedded have been compared with four classical feature selection methods and three well-known population based algorithms using 15 cancer micro-array datasets with different numbers of features and classes. The results show that the weighted feature selection strategies embedded have improved the feature selection capability of bacterial algorithms. The new proposed mechanisms embedded in Bacterial Colony Optimization method can overcome the limitation of the traditional bacterial based algorithms using premature termination to decrease the computational time, and provide comparable or in most cases better solutions than other feature selection methods considered in the comparison.

[1]  Min Han,et al.  Feature selection techniques with class separability for multivariate time series , 2013, Neurocomputing.

[2]  Padmavathi Kora,et al.  Hybrid Bacterial Foraging and Particle Swarm Optimization for detecting Bundle Branch Block , 2015, SpringerPlus.

[3]  Petros Koumoutsakos,et al.  Optimization based on bacterial chemotaxis , 2002, IEEE Trans. Evol. Comput..

[4]  John Wang,et al.  Encyclopedia of Data Warehousing and Mining , 2005 .

[5]  Mohammad Reza Keyvanpour,et al.  A NOVEL EMBEDDED FEATURE SELECTION METHOD: A COMPARATIVE STUDY IN THE APPLICATION OF TEXT CATEGORIZATION , 2013, Appl. Artif. Intell..

[6]  Huan Liu,et al.  Searching for interacting features in subset selection , 2009, Intell. Data Anal..

[7]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[8]  Adel Al-Jumaily,et al.  Feature subset selection using differential evolution and a statistical repair mechanism , 2011, Expert Syst. Appl..

[9]  Lei Chen,et al.  Gene expression profiling gut microbiota in different races of humans , 2016, Scientific Reports.

[10]  A. Wayne Whitney,et al.  A Direct Method of Nonparametric Measurement Selection , 1971, IEEE Transactions on Computers.

[11]  Adrião Duarte Dória Neto,et al.  A multi-level approach using genetic algorithms in an ensemble of Least Squares Support Vector Machines , 2016, Knowl. Based Syst..

[12]  Li-Yeh Chuang,et al.  Improved binary PSO for feature selection using gene expression data , 2008, Comput. Biol. Chem..

[13]  Jean-Philippe Vert,et al.  The Influence of Feature Selection Methods on Accuracy, Stability and Interpretability of Molecular Signatures , 2011, PloS one.

[14]  Minghao Yin,et al.  Multiobjective Binary Biogeography Based Optimization for Feature Selection Using Gene Expression Data , 2013, IEEE Transactions on NanoBioscience.

[15]  Ben Niu,et al.  Novel Bacterial Foraging Optimization with Time-varying Chemotaxis Step , 2011 .

[16]  Grzegorz Dudek,et al.  An Artificial Immune System for Classification With Local Feature Selection , 2012, IEEE Transactions on Evolutionary Computation.

[17]  Verónica Bolón-Canedo,et al.  A review of feature selection methods on synthetic data , 2013, Knowledge and Information Systems.

[18]  Tansel Özyer,et al.  A Consistency-Based Feature Selection Method Allied with Linear SVMs for HIV-1 Protease Cleavage Site Prediction , 2013, PloS one.

[19]  Alessandro Verri,et al.  A Regularized Method for Selecting Nested Groups of Relevant Genes from Microarray Data , 2008, J. Comput. Biol..

[20]  Hongwei Hao,et al.  Selecting feature subset with sparsity and low redundancy for unsupervised learning , 2015, Knowl. Based Syst..

[21]  Mengjie Zhang,et al.  Particle Swarm Optimization for Feature Selection in Classification: A Multi-Objective Approach , 2013, IEEE Transactions on Cybernetics.

[22]  Kevin M. Passino,et al.  Biomimicry of bacterial foraging for distributed optimization and control , 2002 .

[23]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Jianyu Yang,et al.  Object-oriented feature selection of high spatial resolution images using an improved Relief algorithm , 2013, Math. Comput. Model..

[25]  Li-Yeh Chuang,et al.  IG-GA: A Hybrid Filter/Wrapper Method for Feature Selection of Microarray Data , 2010 .

[26]  Verónica Bolón-Canedo,et al.  A review of microarray datasets and applied feature selection methods , 2014, Inf. Sci..

[27]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[28]  Jian Ma,et al.  Igf-bagging: Information gain based feature selection for bagging , 2011 .

[29]  Hong Wang,et al.  Bacterial Colony Optimization , 2012 .

[30]  David Mason,et al.  Encyclopedia of Data Warehousing and Mining, 2nd ed. , 2009 .

[31]  Dong Ying Liang,et al.  An Intelligent Feature Selection Method Based on the Bacterial Foraging Algorithm , 2011 .

[32]  Jianzhong Wang,et al.  Maximum weight and minimum redundancy: A novel framework for feature subset selection , 2013, Pattern Recognit..

[33]  Benjamin Schrauwen,et al.  Optimized Parameter Search for Large Datasets of the Regularization Parameter and Feature Selection for Ridge Regression , 2013, Neural Processing Letters.