Similarity Metric Learning for a Variable-Kernel Classifier

Nearest-neighbor interpolation algorithms have many useful properties for applications to learning, but they often exhibit poor generalization. In this paper, it is shown that much better generalization can be obtained by using a variable interpolation kernel in combination with conjugate gradient optimization of the similarity metric and kernel size. The resulting method is called variable-kernel similarity metric (VSM) learning. It has been tested on several standard classification data sets, and on these problems it shows better generalization than backpropagation and most other learning methods. The number of parameters that must be determined through optimization are orders of magnitude less than for backpropagation or radial basis function (RBF) networks, which may indicate that the method better captures the essential degrees of variation in learning. Other features of VSM learning are discussed that make it relevant to models for biological learning in the brain.

[1]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[2]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[3]  Chin-Liang Chang,et al.  Finding Prototypes For Nearest Neighbor Classifiers , 1974, IEEE Transactions on Computers.

[4]  I. Tomek An Experiment with the Edited Nearest-Neighbor Rule , 1976 .

[5]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1976, TOMS.

[6]  David L. Waltz,et al.  Toward memory-based reasoning , 1986, CACM.

[7]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[8]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[9]  Terrence J. Sejnowski,et al.  Analysis of hidden units in a layered network trained to classify sonar targets , 1988, Neural Networks.

[10]  David S. Broomhead,et al.  Multivariable Functional Interpolation and Adaptive Networks , 1988, Complex Syst..

[11]  W. Cleveland,et al.  Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting , 1988 .

[12]  D. Broomhead,et al.  Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .

[13]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[14]  R. Nosofsky,et al.  Rules and exemplars in categorization, identification, and recognition. , 1989, Journal of experimental psychology. Learning, memory, and cognition.

[15]  C. Atkeson,et al.  Learning arm kinematics and dynamics. , 1989, Annual review of neuroscience.

[16]  David H. Wolpert,et al.  Constructing a generalizer superior to NETtalk via a mathematical theory of generalization , 1990, Neural Networks.

[17]  Tomaso A. Poggio,et al.  Extensions of a Theory of Networks for Approximation and Learning , 1990, NIPS.

[18]  Stephen M. Omohundro,et al.  Best-First Model Merging for Dynamic Learning and Recognition , 1991, NIPS.

[19]  Christopher G. Atkeson,et al.  Using locally weighted regression for robot learning , 1991, Proceedings. 1991 IEEE International Conference on Robotics and Automation.

[20]  Thomas G. Dietterich,et al.  Improving the Performance of Radial Basis Function Networks by Learning Center Locations , 1991, NIPS.

[21]  Léon Bottou,et al.  Local Learning Algorithms , 1992, Neural Computation.

[22]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  I. Omiaj,et al.  Extensions of a Theory of Networks for Approximation and Learning : dimensionality reduction and clustering , 2022 .