Error curves for evaluating the quality of feature rankings

In this article, we propose a method for evaluating feature ranking algorithms. A feature ranking algorithm estimates the importance of descriptive features when predicting the target variable, and the proposed method evaluates the correctness of these importance values by computing the error measures of two chains of predictive models. The models in the first chain are built on nested sets of top-ranked features, while the models in the other chain are built on nested sets of bottom ranked features. We investigate which predictive models are appropriate for building these chains, showing empirically that the proposed method gives meaningful results and can detect differences in feature ranking quality. This is first demonstrated on synthetic data, and then on several real-world classification benchmark problems.

[1]  Jesper Tegnér,et al.  Consistent Feature Selection for Pattern Recognition in Polynomial Time , 2007, J. Mach. Learn. Res..

[2]  Saso Dzeroski,et al.  Quantitative Score for Assessing the Quality of Feature Rankings , 2018, Informatica.

[3]  Cesare Furlanello,et al.  Algebraic stability indicators for ranked lists in molecular profiling , 2008, Bioinform..

[4]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[5]  Yvan Saeys,et al.  Robust Feature Selection Using Ensemble Feature Selection Techniques , 2008, ECML/PKDD.

[6]  Taghi M. Khoshgoftaar,et al.  A survey of stability analysis of feature subset selection techniques , 2013, 2013 IEEE 14th International Conference on Information Reuse & Integration (IRI).

[7]  Melanie Hilario,et al.  Stability of feature selection algorithms: a study on high-dimensional spaces , 2007, Knowledge and Information Systems.

[8]  Vladimir Vapnik,et al.  Support-vector networks , 2004, Machine Learning.

[9]  Somesh Jha,et al.  Analyzing the Robustness of Nearest Neighbors to Adversarial Examples , 2017, ICML.

[10]  Gökhan BakIr,et al.  Predicting Structured Data , 2008 .

[11]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[12]  G. N. Lance,et al.  Mixed-Data Classificatory Programs I - Agglomerative Systems , 1967, Aust. Comput. J..

[13]  Shie Mannor,et al.  Robustness and Regularization of Support Vector Machines , 2008, J. Mach. Learn. Res..

[14]  Saso Dzeroski,et al.  Predicting Chemical Parameters of River Water Quality from Bioindicator Data , 2000, Applied Intelligence.

[15]  Cesare Furlanello,et al.  Semisupervised Profiling of Gene Expressions and Clinical Data , 2005, WILF.

[16]  Wei-Ying Ma,et al.  An Evaluation on Feature Selection for Text Clustering , 2003, ICML.

[17]  Cesare Furlanello,et al.  Entropy-based gene ranking without selection bias for the predictive classification of microarray data , 2003, BMC Bioinformatics.

[18]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[19]  Antanas Verikas,et al.  Mining data with random forests: A survey and results of new tests , 2011, Pattern Recognit..

[20]  T. Wieczorek,et al.  Comparison of feature ranking methods based on information entropy , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[21]  Ivor W. Tsang,et al.  Core Vector Machines: Fast SVM Training on Very Large Data Sets , 2005, J. Mach. Learn. Res..

[22]  Roberto Guzmán-Martínez,et al.  Feature Selection Stability Assessment Based on the Jensen-Shannon Divergence , 2011, ECML/PKDD.

[23]  Elena Marchiori,et al.  Ensemble Feature Ranking , 2004, PKDD.

[24]  Anouar Boucheham,et al.  Robust biomarker discovery for cancer diagnosis based on meta-ensemble feature selection , 2014, 2014 Science and Information Conference.

[25]  Paolo Vineis,et al.  On the Stability of Feature Selection in Multiomics Data , 2021, 2021 International Joint Conference on Neural Networks (IJCNN).

[26]  Adam C. Winstanley,et al.  Invariant optimal feature selection: A distance discriminant and feature ranking based solution , 2008, Pattern Recognit..

[27]  Blaz Zupan,et al.  Data and text mining Visualization-based cancer microarray data classification analysis , 2007 .

[28]  J. Biesiada,et al.  Feature ranking methods based on information entropy with Parzen windows , 2005 .

[29]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[30]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[31]  Eyke Hüllermeier,et al.  Weighted Rank Correlation: A Flexible Approach Based on Fuzzy Order Relations , 2015, ECML/PKDD.

[32]  Melanie Hilario,et al.  Stability of feature selection algorithms , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[33]  Saso Dzeroski,et al.  Automated Revision of Expert Rules for Treating Acute Abdominal Pain in Children , 1997, AIME.

[34]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[35]  G. N. Lance,et al.  Computer Programs for Hierarchical Polythetic Classification ("Similarity Analyses") , 1966, Comput. J..

[36]  Zhihua Li,et al.  A redundancy-removing feature selection algorithm for nominal data , 2015, PeerJ Prepr..

[37]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.