Online and batch learning of pseudo-metrics

We describe and analyze an online algorithm for supervised learning of pseudo-metrics. The algorithm receives pairs of instances and predicts their similarity according to a pseudo-metric. The pseudo-metrics we use are quadratic forms parameterized by positive semi-definite matrices. The core of the algorithm is an update rule that is based on successive projections onto the positive semi-definite cone and onto half-space constraints imposed by the examples. We describe an efficient procedure for performing these projections, derive a worst case mistake bound on the similarity predictions, and discuss a dual version of the algorithm in which it is simple to incorporate kernel operators. The online algorithm also serves as a building block for deriving a large-margin batch algorithm. We demonstrate the merits of the proposed approach by conducting experiments on MNIST dataset and on document filtering.

[1]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[2]  J. H. Wilkinson The algebraic eigenvalue problem , 1966 .

[3]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[4]  David G. Stork,et al.  Pattern Classification , 1973 .

[5]  Chris Buckley,et al.  Pivoted document length normalization , 1996, SIGIR '96.

[6]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[7]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[8]  Andrzej Stachurski,et al.  Parallel Optimization: Theory, Algorithms and Applications , 2000, Scalable Comput. Pract. Exp..

[9]  Mark Herbster,et al.  Learning Additive Models Online with Fast Evaluating Kernels , 2001, COLT/EuroCOLT.

[10]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[11]  John Shawe-Taylor,et al.  The Perceptron Algorithm with Uneven Margins , 2002, ICML.

[12]  Misha Pavel,et al.  Adjustment Learning and Relevant Component Analysis , 2002, ECCV.

[13]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[14]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[15]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[16]  Amos Storkey,et al.  Advances in Neural Information Processing Systems 20 , 2007 .