Variational Relevance Vector Machines

The Support Vector Machine (SVM) of Vapnik (1998) has become widely established as one of the leading approaches to pattern recognition and machine learning. It expresses predictions in terms of a linear combination of kernel functions centred on a subset of the training data, known as support vectors. Despite its widespread success, the SVM suffers from some important limitations, one of the most significant being that it makes point predictions rather than generating predictive distributions. Recently Tipping (1999) has formulated the Relevance Vector Machine (RVM), a probabilistic model whose functional form is equivalent to the SVM. It achieves comparable recognition accuracy to the SVM, yet provides a full predictive distribution, and also requires substantially fewer kernel functions. The original treatment of the RVM relied on the use of type II maximum likelihood (the `evidence framework') to provide point estimates of the hyperparameters which govern model sparsity. In this paper we show how the RVM can be formulated and solved within a completely Bayesian paradigm through the use of variational inference, thereby giving a posterior distribution over both parameters and hyperparameters. We demonstrate the practicality and performance of the variational RVM using both synthetic and real world examples.

[1]  Radford M. Neal A new view of the EM algorithm that justifies incremental and other variants , 1993 .

[2]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[3]  David J. C. MacKay,et al.  The Evidence Framework Applied to Classification Networks , 1992, Neural Computation.

[4]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[5]  Steve R. Waterhouse,et al.  Bayesian Methods for Mixtures of Experts , 1995, NIPS.

[6]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[7]  Takashi Matsumoto,et al.  Reconstruction and prediction of nonlinear dynamical systems: a hierarchical Bayes approach with neural nets , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[8]  Grace Wahba,et al.  Margin-like quantities and generalized approximate cross validation for support vector machines , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[9]  Michael I. Jordan,et al.  Bayesian parameter estimation via variational methods , 2000, Stat. Comput..

[10]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[11]  Christopher M. Bishop,et al.  Bayesian PCA , 1998, NIPS.

[12]  Michael E. Tipping The Relevance Vector Machine , 1999, NIPS.

[13]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[14]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[15]  Peter Sollich Probabilistic interpretations and Bayesian methods for support vector machines , 1999 .

[16]  Brian D. Ripley,et al.  Neural Networks and Related Methods for Classification , 1994 .