Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy

Feature selection is an important problem for pattern classification systems. We study how to select good features according to the maximal statistical dependency criterion based on mutual information. Because of the difficulty in directly implementing the maximal dependency condition, we first derive an equivalent form, called minimal-redundancy-maximal-relevance criterion (mRMR), for first-order incremental feature selection. Then, we present a two-stage feature selection algorithm by combining mRMR and other more sophisticated feature selectors (e.g., wrappers). This allows us to select a compact set of superior features at very low cost. We perform extensive experimental comparison of our algorithm and other methods using three different classifiers (naive Bayes, support vector machine, and linear discriminate analysis) and four different data sets (handwritten digits, arrhythmia, NCI cancer cell lines, and lymphoma tissues). The results confirm that mRMR leads to promising improvement on feature selection and classification accuracy.

[1]  J. Wade Davis,et al.  Statistical Pattern Recognition , 2003, Technometrics.

[2]  Thomas M. Cover,et al.  The Best Two Independent Measurements Are Not the Two Best , 1974, IEEE Trans. Syst. Man Cybern..

[3]  Christos Davatzikos,et al.  Bayesian clustering methods for morphological analysis of MR images , 2002, Proceedings IEEE International Symposium on Biomedical Imaging.

[4]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[5]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[6]  P. Langley Selection of Relevant Features in Machine Learning , 1994 .

[7]  Christos Davatzikos,et al.  A Bayesian morphometry algorithm , 2004, IEEE Transactions on Medical Imaging.

[8]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[9]  I. Ilišević,et al.  Jensen's inequality , 2005 .

[10]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[11]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[12]  Chris H. Q. Ding,et al.  Structure search and stability enhancement of Bayesian networks , 2003, Third IEEE International Conference on Data Mining.

[13]  D. Botstein,et al.  A gene expression database for the molecular pharmacology of cancer , 2000, Nature Genetics.

[14]  Walter L. Ruzzo,et al.  Improved Gene Selection for Classification of Microarrays , 2002, Pacific Symposium on Biocomputing.

[15]  Christian A. Rees,et al.  Systematic variation in gene expression patterns in human cancer cell lines , 2000, Nature Genetics.

[16]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[17]  M. Xiong,et al.  Biomarker Identification by Feature Wrappers , 2022 .

[18]  R Kahavi,et al.  Wrapper for feature subset selection , 1997 .

[19]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[21]  C. Pelizzari Registration of Localization Images by Maximization of Mutual Information , 1996 .

[22]  Wentian Li,et al.  How Many Genes are Needed for a Discriminant Microarray Data Analysis , 2001, physics/0104029.

[23]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[24]  Paul A. Rubin,et al.  Feature Selection for Multiclass Discrimination via Mixed-Integer Linear Programming , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Chong-Ho Choi,et al.  Input Feature Selection by Mutual Information Based on Parzen Window , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[27]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Robert P. W. Duin,et al.  Experiments with Classifier Combining Rules , 2000, Multiple Classifier Systems.

[29]  E. J. McShane Jensen's inequality , 1937 .