A Feature Subset Evaluation Method Based on Multi-objective Optimization

To remove the irrelevant and redundant features from the high-dimensional data while ensuring classification accuracy, a supervised feature subset evaluation method based on multi-objective optimization has been proposed in this paper. Four aspects, sparsity of feature space, classification accuracy, information loss degree and feature subset stability, were took into account in the proposed method and the Multi-objective functions were constructed. Then the popular NSGA-II algorithm was used for optimization of the four objectives in the feature selection process. Finally the feature subset was selected based on the obtained feature weight vector according the four evaluation criteria. The proposed method was tested on 4 standard data sets using two kinds of classifier. The experiment results show that the proposed method can guarantee the higher classification accuracy even though only few numbers of features selected than the other methods. On the other hand, the information loss degrees of the proposed method are the lowest which demonstrates that the selected feature subsets of the proposed method can represent the original data sets best.

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  Vipin Kumar,et al.  Feature Selection: A literature Review , 2014, Smart Comput. Rev..

[3]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[4]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[5]  M. Soylak,et al.  Preconcentration/separation of lead at trace level from water samples by mixed micelle cloud point extraction , 2015 .

[6]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Eamonn J. Keogh,et al.  Curse of Dimensionality , 2010, Encyclopedia of Machine Learning.

[8]  Timothy A. Gonsalves,et al.  Feature Selection for Text Classification Based on Gini Coefficient of Inequality , 2010, FSDM.

[9]  Jyoti,et al.  Multi-objective genetic algorithm approach to feature subset optimization , 2014, 2014 IEEE International Advance Computing Conference (IACC).

[10]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[11]  L. J. Wei,et al.  Asymptotic Conservativeness and Efficiency of Kruskal-Wallis Test for K Dependent Samples , 1981 .

[12]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[13]  Douglas M. Hawkins,et al.  The Problem of Overfitting , 2004, J. Chem. Inf. Model..

[14]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[15]  Swarun Kumar,et al.  LTE radio analytics made easy and accessible , 2015, SIGCOMM 2015.

[16]  Fakhri Karray,et al.  Multi-objective Feature Selection with NSGA II , 2007, ICANNGA.