Extensions of the Informative Vector Machine

The informative vector machine (IVM) is a practical method for Gaussian process regression and classification. The IVM produces a sparse approximation to a Gaussian process by combining assumed density filtering with a heuristic for choosing points based on minimizing posterior entropy. This paper extends IVM in several ways. First, we propose a novel noise model that allows the IVM to be applied to a mixture of labeled and unlabeled data. Second, we use IVM on a block-diagonal covariance matrix, for “learning to learn” from related tasks. Third, we modify the IVM to incorporate prior knowledge from known invariances. All of these extensions are tested on artificial and real data.

[1]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[2]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[3]  Neil D. Lawrence,et al.  Learning to learn with the informative vector machine , 2004, ICML.

[4]  R. Kass,et al.  Approximate Bayesian Inference in Conditionally Independent Hierarchical Models (Parametric Empirical Bayes Models) , 1989 .

[5]  Bernhard Schölkopf,et al.  Incorporating Invariances in Support Vector Learning Machines , 1996, ICANN.

[6]  Matthias W. Seeger,et al.  Covariance Kernels from Bayesian Generative Models , 2001, NIPS.

[7]  Neil D. Lawrence,et al.  Semi-supervised Learning via Gaussian Processes , 2004, NIPS.

[8]  Bernhard Schölkopf,et al.  Estimating a Kernel Fisher Discriminant in the Presence of Label Noise , 2001, ICML.

[9]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[10]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[11]  Jonathan Baxter,et al.  Learning internal representations , 1995, COLT '95.

[12]  L. Csató Gaussian processes:iterative sparse approximations , 2002 .

[13]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[14]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[15]  Ian T. Nabney,et al.  Netlab: Algorithms for Pattern Recognition , 2002 .

[16]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[17]  Christopher K. I. Williams Regression with Gaussian processes , 1997 .

[18]  Manfred Opper,et al.  Sparse Representation for Gaussian Process Models , 2000, NIPS.

[19]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[20]  Matthias W. Seeger,et al.  Bayesian Gaussian process models : PAC-Bayesian generalisation error bounds and sparse approximations , 2003 .

[21]  Bernhard Schölkopf,et al.  Cluster Kernels for Semi-Supervised Learning , 2002, NIPS.

[22]  Peter Sollich Probabilistic interpretations and Bayesian methods for support vector machines , 1999 .

[23]  Christopher K. I. Williams Computing with Infinite Networks , 1996, NIPS.

[24]  Neil D. Lawrence,et al.  Fast Sparse Gaussian Process Methods: The Informative Vector Machine , 2002, NIPS.

[25]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[26]  Carl E. Rasmussen,et al.  In Advances in Neural Information Processing Systems , 2011 .