Feature Weighting for Nearest Neighbor by Estimation of Distribution Algorithms

The accuracy of a Nearest Neighbor classifier depends heavily on the weight of each feature in its distance metric. In this paper, two new methods, FW-EBNA (Feature Weighting by Estimation of Bayesian Network Algorithm) and FW-EGNA (Feature Weighting by Estimation of Gaussian Network Algorithm), inspired by the Estimation of Distribution Algorithm (EDA) approach, are used together with a wrapper evaluation scheme to learn accurate feature weights for the Nearest Neighbor algorithm. While the FW-EBNA has a set of three possible discrete weights, the FW-EGNA works in a continuous range of weights. Both methods are compared in a set of natural and artificial domains with two sequential and one Genetic Algorithm.

[1]  David G. Lowe,et al.  Similarity Metric Learning for a Variable-Kernel Classifier , 1995, Neural Computation.

[2]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[3]  Ron Kohavi,et al.  The Utility of Feature Weighting in Nearest-Neighbor Algorithms , 1997 .

[4]  David L. Waltz,et al.  Toward memory-based reasoning , 1986, CACM.

[5]  Andrew Y. Ng,et al.  Preventing "Overfitting" of Cross-Validation Data , 1997, ICML.

[6]  David L. Waltz,et al.  Trading MIPS and memory for knowledge engineering , 1992, CACM.

[7]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[8]  Thomas Bäck,et al.  Evolutionary Algorithms in Theory and Practice , 1996 .

[9]  David E. Goldberg,et al.  Linkage Problem, Distribution Estimation, and Bayesian Networks , 2000, Evolutionary Computation.

[10]  Ethem Alpaydın,et al.  Combined 5 x 2 cv F Test for Comparing Supervised Classification Learning Algorithms , 1999, Neural Comput..

[11]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[12]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[13]  David Heckerman,et al.  Learning Gaussian Networks , 1994, UAI.

[14]  David W. Aha,et al.  Feature Selection for Case-Based Classification of Cloud Types: An Empirical Comparison , 1994 .

[15]  Pedro Larrañaga,et al.  Feature Subset Selection by Estimation of Distribution Algorithms , 2002, Estimation of Distribution Algorithms.

[16]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[17]  Pedro Larrañaga,et al.  Feature Subset Selection by Bayesian network-based optimization , 2000, Artif. Intell..

[18]  Walter Daelemans,et al.  Data-Oriented Methods for Grapheme-to-Phoneme Conversion , 1993, EACL.

[19]  Pedro Larrañaga,et al.  Optimization in Continuous Domains by Learning and Simulation of Gaussian Networks , 2000 .

[20]  J. A. Lozano,et al.  Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation , 2001 .

[21]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[22]  Nir Friedman,et al.  On the Sample Complexity of Learning Bayesian Networks , 1996, UAI.

[23]  Thomas G. Dietterich,et al.  An Experimental Comparison of the Nearest-Neighbor and Nearest-Hyperrectangle Algorithms , 1995, Machine Learning.

[24]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[25]  Tony R. Martinez,et al.  Instance-Based Learning with Genetically Derived Attribute Weights , 1996 .

[26]  Claire Cardie,et al.  Examining Locally Varying Weights for Nearest Neighbor Algorithms , 1997, ICCBR.

[27]  Richard J. Enbody,et al.  Further Research on Feature Selection and Classification Using Genetic Algorithms , 1993, ICGA.

[28]  David B. Skalak,et al.  Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms , 1994, ICML.

[29]  Claire Cardie,et al.  Improving Minority Class Prediction Using Case-Specific Feature Weights , 1997, ICML.

[30]  David W. Aha,et al.  A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms , 1997, Artificial Intelligence Review.

[31]  John J. Grefenstette,et al.  Optimization of Control Parameters for Genetic Algorithms , 1986, IEEE Transactions on Systems, Man, and Cybernetics.

[32]  David W. Aha,et al.  Tolerating Noisy, Irrelevant and Novel Attributes in Instance-Based Learning Algorithms , 1992, Int. J. Man Mach. Stud..

[33]  Wray L. Buntine Theory Refinement on Bayesian Networks , 1991, UAI.

[34]  Lawrence Davis,et al.  A Hybrid Genetic Algorithm for Classification , 1991, IJCAI.