Feature Subset Selection by Estimation of Distribution Algorithms

Feature Subset Selection is a well known task in the Machine Learn-ing, Data Mining, Pattern Recognition and Text Learning paradigms. In this chapter, we present a set of Estimation of Distribution Algorihtms (EDAs) inspired techniques to tackle the Feature Subset Selection problem in Machine Learning and Data Mining tasks. Bayesian networks are used to factorize the probability distribution of best solutions in small and medium dimensionality datasets, and simpler probabilistic models are used in larger dimensionality domains. In a comparison with different sequential and genetic-inspired algorithms in natural and artificial datasets, EDA-based approaches have obtained encouraging accuracy results and need a smaller number of evaluations than genetic approaches.

[1]  Pat Langley,et al.  Induction of Selective Bayesian Classifiers , 1994, UAI.

[2]  Wray L. Buntine Theory Refinement on Bayesian Networks , 1991, UAI.

[3]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[4]  David E. Goldberg,et al.  Linkage Problem, Distribution Estimation, and Bayesian Networks , 2000, Evolutionary Computation.

[5]  Pedro Larrañaga,et al.  Feature Subset Selection by Bayesian network-based optimization , 2000, Artif. Intell..

[6]  David W. Aha,et al.  Feature Selection for Case-Based Classification of Cloud Types: An Empirical Comparison , 1994 .

[7]  Thomas Bäck,et al.  Evolutionary Algorithms in Theory and Practice , 1996 .

[8]  Alan J. Miller,et al.  Subset Selection in Regression , 1991 .

[9]  Nir Friedman,et al.  On the Sample Complexity of Learning Bayesian Networks , 1996, UAI.

[10]  Ethem Alpaydın,et al.  Combined 5 x 2 cv F Test for Comparing Supervised Classification Learning Algorithms , 1999, Neural Comput..

[11]  Bojan Cestnik,et al.  Estimating Probabilities: A Crucial Task in Machine Learning , 1990, ECAI.

[12]  Jack Sklansky,et al.  On Automatic Feature Selection , 1988, Int. J. Pattern Recognit. Artif. Intell..

[13]  Paul A. Viola,et al.  MIMIC: Finding Optima by Estimating Probability Densities , 1996, NIPS.

[14]  Dirk Thierens,et al.  Mixing in Genetic Algorithms , 1993, ICGA.

[15]  Andrew Y. Ng,et al.  Preventing "Overfitting" of Cross-Validation Data , 1997, ICML.

[16]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[17]  Anil K. Jain,et al.  39 Dimensionality and sample size considerations in pattern recognition practice , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[18]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[19]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[20]  Justin Doak,et al.  CSE-92-18 - An Evaluation of Feature Selection Methodsand Their Application to Computer Security , 1992 .

[21]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[22]  Francesc J. Ferri,et al.  Comparative study of techniques for large-scale feature selection* *This work was suported by a SERC grant GR/E 97549. The first author was also supported by a FPI grant from the Spanish MEC, PF92 73546684 , 1994 .

[23]  Shumeet Baluja,et al.  A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning , 1994 .

[24]  Alan J. Miller Subset Selection in Regression , 1992 .

[25]  John J. Grefenstette,et al.  Optimization of Control Parameters for Genetic Algorithms , 1986, IEEE Transactions on Systems, Man, and Cybernetics.

[26]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[27]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[28]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  L. Darrell Whitley,et al.  Searching in the Presence of Noise , 1996, PPSN.

[30]  Gilbert Syswerda,et al.  Simulated Crossover in Genetic Algorithms , 1992, FOGA.

[31]  Yang Xiang,et al.  Parallel Learning of Belief Networks in Large and Difficult Domains , 2004, Data Mining and Knowledge Discovery.

[32]  M. Pelikán,et al.  The Bivariate Marginal Distribution Algorithm , 1999 .

[33]  Mineichi Kudo,et al.  Comparison of algorithms that select features for pattern classifiers , 2000, Pattern Recognit..

[34]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[35]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Feature Subset Selection , 1977, IEEE Transactions on Computers.

[36]  J. Kittler,et al.  Feature Set Search Alborithms , 1978 .

[37]  Ron Kohavi,et al.  Data Mining Using MLC a Machine Learning Library in C++ , 1996, Int. J. Artif. Intell. Tools.

[38]  Ulises Cortés,et al.  A parallel algorithm for building possibilistic causal networks , 1998, Int. J. Approx. Reason..

[39]  Dunja Mladenic,et al.  Feature Subset Selection in Text-Learning , 1998, ECML.