Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning

We present a novel Bayesian approach to the problem of value function estimation in continuous state spaces. We define a probabilistic generative model for the value function by imposing a Gaussian prior over value functions and assuming a Gaussian noise model. Due to the Gaussian nature of the random processes involved, the posterior distribution of the value function is also Gaussian and is therefore described entirely by its mean and covariance. We derive exact expressions for the posterior process moments, and utilizing an efficient sequential sparsification method, we describe an on-line algorithm for learning them. We demonstrate the operation of the algorithm on a 2-dimensional continuous spatial navigation domain.

[1]  Edward J. Wegman,et al.  Statistical Signal Processing , 1985 .

[2]  John Rust Using Randomization to Break the Curse of Dimensionality , 1997 .

[3]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[4]  Christopher J. C. Burges,et al.  Simplified Support Vector Decision Rules , 1996, ICML.

[5]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[6]  John N. Tsitsiklis,et al.  Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[7]  M. Gibbs,et al.  Efficient implementation of gaussian processes , 1997 .

[8]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[9]  Christopher K. I. Williams Prediction with Gaussian Processes: From Linear Regression to Linear Prediction and Beyond , 1999, Learning in Graphical Models.

[10]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[11]  Stuart J. Russell,et al.  Bayesian Q-Learning , 1998, AAAI/IAAI.

[12]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[13]  Manfred Opper,et al.  Sparse Representation for Gaussian Process Models , 2000, NIPS.

[14]  Alexander J. Smola,et al.  Sparse Greedy Gaussian Process Regression , 2000, NIPS.

[15]  Christopher K. I. Williams,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[16]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[17]  Xin Wang,et al.  Batch Value Function Approximation via Support Vectors , 2001, NIPS.

[18]  Shie Mannor,et al.  Sparse Online Greedy Support Vector Regression , 2002, ECML.

[19]  Rémi Munos,et al.  A Study of Reinforcement Learning in the Continuous Case by the Means of Viscosity Solutions , 2000, Machine Learning.

[20]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[21]  Richard S. Sutton,et al.  Reinforcement Learning , 1992, Handbook of Machine Learning.