Transferring Nonlinear Representations using Gaussian Processes with a Shared Latent Space

When a series of problems are related, representations derived from learning earlier tasks may be useful in solving later problems. In this paper we propose a novel approach to transfer learning with low-dimensional, non-linear latent spaces. We show how such representations can be jointly learned across multiple tasks in a Gaussian Process framework. When transferred to new tasks with relatively few training examples, learning can be faster and/or more accurate. Experiments on digit recognition and newsgroup classification tasks show significantly improved performance when compared to baseline performance with a representation derived from a semi-supervised learning approach or with a discriminative approach that uses only the target data.

[1]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[2]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[3]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[4]  David Barber,et al.  Bayesian Classification With Gaussian Processes , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[6]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[7]  Jonathan Baxter,et al.  A Bayesian/Information Theoretic Model of Learning to Learn via Multiple Task Sampling , 1997, Machine Learning.

[8]  Neil D. Lawrence,et al.  Learning to learn with the informative vector machine , 2004, ICML.

[9]  Anton Schwaighofer,et al.  Learning Gaussian processes from multiple tasks , 2005, ICML.

[10]  Rajesh P. N. Rao,et al.  Learning Shared Latent Structure for Image Synthesis and Robotic Imitation , 2005, NIPS.

[11]  Carl E. Rasmussen,et al.  Assessing Approximations for Gaussian Process Classification , 2005, NIPS.

[12]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[13]  David J. Fleet,et al.  Gaussian Process Dynamical Models , 2005, NIPS.

[14]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[15]  Yee Whye Teh,et al.  Semiparametric latent factor models , 2005, AISTATS.

[16]  Joaquin Quiñonero Candela,et al.  Local distance preservation in the GP-LVM through back constraints , 2006, ICML.

[17]  Rajat Raina,et al.  Constructing informative priors using transfer learning , 2006, ICML.

[18]  Neil D. Lawrence,et al.  Gaussian Processes and the Null-Category Noise Model , 2006, Semi-Supervised Learning.

[19]  Edwin V. Bonilla,et al.  Multi-task Gaussian Process Prediction , 2007, NIPS.

[20]  Neil D. Lawrence,et al.  Learning for Larger Datasets with the Gaussian Process Latent Variable Model , 2007, AISTATS.

[21]  Trevor Darrell,et al.  Discriminative Gaussian process latent variable model for classification , 2007, ICML '07.

[22]  Edwin V. Bonilla,et al.  Kernel Multi-task Learning using Task-specific Features , 2007, AISTATS.

[23]  Neil D. Lawrence,et al.  Hierarchical Gaussian process latent variable models , 2007, ICML '07.

[24]  Meila,et al.  Kernel multitask learning using task-specific features , 2007 .