Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes

We show how to use unlabeled data and a deep belief net (DBN) to learn a good covariance kernel for a Gaussian process. We first learn a deep generative model of the unlabeled data using the fast, greedy algorithm introduced by [7]. If the data is high-dimensional and highly-structured, a Gaussian kernel applied to the top layer of features in the DBN works much better than a similar kernel applied to the raw input. Performance at both regression and classification can then be further improved by using backpropagation through the DBN to discriminatively fine-tune the covariance kernel.

[1]  J. K. Benedetti On the Nonparametric Estimation of Regression Functions , 1977 .

[2]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[3]  Matthias W. Seeger,et al.  Covariance Kernels from Bayesian Generative Models , 2001, NIPS.

[4]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[5]  Bernhard Schölkopf,et al.  Estimating a Kernel Fisher Discriminant in the Presence of Label Noise , 2001, ICML.

[6]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[7]  Geoffrey E. Hinton,et al.  Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[8]  Neil D. Lawrence,et al.  Semi-supervised Learning via Gaussian Processes , 2004, NIPS.

[9]  Zoubin Ghahramani,et al.  Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning , 2004, NIPS.

[10]  Joaquin Quiñonero Candela,et al.  Local distance preservation in the GP-LVM through back constraints , 2006, ICML.

[11]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[12]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[13]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[14]  Bernhard Schölkopf,et al.  Introduction to Semi-Supervised Learning , 2006, Semi-Supervised Learning.

[15]  Thomas Hofmann,et al.  Greedy Layer-Wise Training of Deep Networks , 2007 .

[16]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[17]  Ching Y. Suen,et al.  A trainable feature extractor for handwritten digit recognition , 2007, Pattern Recognit..

[18]  Geoffrey E. Hinton,et al.  Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure , 2007, AISTATS.

[19]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.