论文信息 - Correlation-based Feature Selection for Machine Learning

Correlation-based Feature Selection for Machine Learning

A central problem in machine learning is identifying a representative set of features from which to construct a classification model for a particular task. This thesis addresses the problem of feature selection for machine learning through a correlation based approach. The central hypothesis is that good feature sets contain features that are highly correlated with the class, yet uncorrelated with each other. A feature evaluation formula, based on ideas from test theory, provides an operational definition of this hypothesis. CFS (Correlation based Feature Selection) is an algorithm that couples this evaluation formula with an appropriate correlation measure and a heuristic search strategy. CFS was evaluated by experiments on artificial and natural datasets. Three machine learning algorithms were used: C4.5 (a decision tree learner), IB1 (an instance based learner), and naive Bayes. Experiments on artificial datasets showed that CFS quickly identifies and screens irrelevant, redundant, and noisy features, and identifies relevant features as long as their relevance does not strongly depend on other features. On natural domains, CFS typically eliminated well over half the features. In most cases, classification accuracy using the reduced feature set equaled or bettered accuracy using the complete feature set. Feature selection degraded machine learning performance in cases where some features were eliminated which were highly predictive of very small areas of the instance space. Further experiments compared CFS with a wrapper—a well known approach to feature selection that employs the target learning algorithm to evaluate feature sets. In many cases CFS gave comparable results to the wrapper, and in general, outperformed the wrapper on small datasets. CFS executes many times faster than the wrapper, which allows it to scale to larger datasets. Two methods of extending CFS to handle feature interaction are presented and experimentally evaluated. The first considers pairs of features and the second incorporates iii feature weights calculated by the RELIEF algorithm. Experiments on artificial domains showed that both methods were able to identify interacting features. On natural domains, the pairwise method gave more reliable results than using weights provided by RELIEF.

Mark A. Hall | M. Hall

[1] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.

[2] Nir Friedman,et al. Building Classifiers Using Bayesian Networks , 1996, AAAI/IAAI, Vol. 2.

[3] I. Bratko,et al. Information-based evaluation criterion for classifier's performance , 2004, Machine Learning.

[4] Ryszard S. Michalski,et al. A theory and methodology of inductive learning , 1993 .

[5] Thomas Marill,et al. On the effectiveness of receptors in recognition systems , 1963, IEEE Trans. Inf. Theory.

[6] D. Kibler,et al. Instance-based learning algorithms , 2004, Machine Learning.

[7] Alan J. Miller,et al. Subset Selection in Regression , 1991 .

[8] Michael J. Pazzani,et al. Searching for Dependencies in Bayesian Classifiers , 1995, AISTATS.

[9] Ron Kohavi,et al. Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[10] Yoav Freund,et al. Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[11] Seymour Geisser,et al. The Predictive Sample Reuse Method with Applications , 1975 .

[12] Ron Kohavi,et al. Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[13] Gregory M. Provan,et al. Learning Bayesian Networks Using Feature Selection , 1995, AISTATS.

[14] J. Rissanen,et al. Modeling By Shortest Data Description* , 1978, Autom..

[15] Daniel N. Hill,et al. An Empirical Investigation of Brute Force to choose Features, Smoothers and Function Approximators , 1992 .

[16] Huan Liu,et al. A Probabilistic Approach to Feature Selection - A Filter Solution , 1996, ICML.

[17] David W. Aha,et al. Weighting Features , 1995, ICCBR.

[18] Igor Kononenko,et al. Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[19] Huan Liu,et al. Chi2: feature selection and discretization of numeric attributes , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[20] Z. Pawlak. Rough Sets: Theoretical Aspects of Reasoning about Data , 1991 .

[21] Herbert A. Simon,et al. Applications of machine learning and rule induction , 1995, CACM.

[22] Alan Hutchinson,et al. Algorithmic Learning , 1994 .

[23] Pat Langley,et al. Induction of Selective Bayesian Classifiers , 1994, UAI.

[24] W. W. Daniel. Applied Nonparametric Statistics , 1979 .

[25] Keinosuke Fukunaga,et al. A Branch and Bound Algorithm for Feature Subset Selection , 1977, IEEE Transactions on Computers.

[26] J. Ross Quinlan,et al. Simplifying decision trees , 1987, Int. J. Hum. Comput. Stud..

[27] Ian H. Witten,et al. WEKA: a machine learning workbench , 1994, Proceedings of ANZIIS '94 - Australian New Zealnd Intelligent Information Systems Conference.

[28] Andrew W. Moore,et al. Efficient Algorithms for Minimizing Cross Validation Error , 1994, ICML.

[29] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[30] Mehran Sahami,et al. Learning Limited Dependence Bayesian Classifiers , 1996, KDD.

[31] Steven Salzberg,et al. A Nearest Hyperrectangle Learning Method , 1991, Machine Learning.

[32] Thomas H. Wonnacott,et al. Introductory Statistics , 2007, Technometrics.

[33] Judea Pearl,et al. Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[34] Sebastian Thrun,et al. The MONK''s Problems-A Performance Comparison of Different Learning Algorithms, CMU-CS-91-197, Sch , 1991 .

[35] David B. Skalak,et al. Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms , 1994, ICML.

[36] Igor Kononenko,et al. Semi-Naive Bayesian Classifier , 1991, EWSL.

[37] Sally Jo Cunningham,et al. Applications of machine learning in information retrieval , 1999 .

[38] P. Langley. Selection of Relevant Features in Machine Learning , 1994 .

[39] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[40] Ian H. Witten,et al. An MDL estimate of the significance of rules , 1996 .

[41] Geoffrey Holmes,et al. Feature selection via the discovery of simple classification rules , 1995 .

[42] Leo Breiman,et al. Technical note: Some properties of splitting criteria , 2004, Machine Learning.

[43] Ron Kohavi,et al. Useful Feature Subsets and Rough Set Reducts , 1994 .

[44] Robert B. Zajonc,et al. A Note on Group Judgements and Group Size , 1962 .

[45] R. Hogarth. Methods for Aggregating Opinions , 1977 .

[46] Pat Langley,et al. Models of Incremental Concept Formation , 1990, Artif. Intell..

[47] Ron Kohavi,et al. MLC++: a machine learning library in C++ , 1994, Proceedings Sixth International Conference on Tools with Artificial Intelligence. TAI 94.

[48] William Frawley,et al. Knowledge Discovery in Databases , 1991 .

[49] David M. Allen,et al. The Relationship Between Variable Selection and Data Agumentation and a Method for Prediction , 1974 .

[50] C. Brodley,et al. On the Qualitative Behavior of Impurity-Based Splitting Rules I: The Minima-Free Property , 1997 .

[51] William H. Press,et al. Numerical recipes in C , 2002 .

[52] Steven Salzberg,et al. A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features , 2004, Machine Learning.

[53] Jason Catlett,et al. On Changing Continuous Attributes into Ordered Discrete Attributes , 1991, EWSL.

[54] Kenneth DeJong,et al. Genetic algorithms as a tool for restructuring feature space representations , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[55] Jude W. Shavlik,et al. Growing Simpler Decision Trees to Facilitate Knowledge Discovery , 1996, KDD.

[56] Robert C. Holte,et al. Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[57] Wilfried Brauer,et al. Feature Selection by Means of a Feature Weighting Approach , 1997 .

[58] J. Kittler,et al. Feature Set Search Alborithms , 1978 .

[59] Larry A. Rendell,et al. A Practical Approach to Feature Selection , 1992, ML.