Online kernel selection for Bayesian reinforcement learning

Kernel-based Bayesian methods for Reinforcement Learning (RL) such as Gaussian Process Temporal Difference (GPTD) are particularly promising because they rigorously treat uncertainty in the value function and make it easy to specify prior knowledge. However, the choice of prior distribution significantly affects the empirical performance of the learning agent, and little work has been done extending existing methods for prior model selection to the online setting. This paper develops Replacing-Kernel RL, an online model selection method for GPTD using sequential Monte-Carlo methods. Replacing-Kernel RL is compared to standard GPTD and tile-coding on several RL domains, and is shown to yield significantly better asymptotic performance for many different kernel families. Furthermore, the resulting kernels capture an intuitively useful notion of prior state covariance that may nevertheless be difficult to capture manually.

[1]  L. Buşoniu Evolutionary function approximation for reinforcement learning , 2006 .

[2]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[3]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[4]  Thomas G. Dietterich Adaptive computation and machine learning , 1998 .

[5]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[6]  Yaakov Engel,et al.  Algorithms and representations for reinforcement learning (עם תקציר בעברית, תכן ושער נוסף: אלגוריתמים וייצוגים ללמידה מחיזוקים.; אלגוריתמים וייצוגים ללמידה מחיזוקים.) , 2005 .

[7]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[8]  M. Loth,et al.  Sparse Temporal Difference Learning Using LASSO , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[9]  Marc G. Genton,et al.  Classes of Kernels for Machine Learning: A Statistics Perspective , 2002, J. Mach. Learn. Res..

[10]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[11]  N. Gordon,et al.  Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[12]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[13]  Michael L. Littman,et al.  Efficient Reinforcement Learning with Relocatable Action Models , 2007, AAAI.

[14]  Matthias W. Seeger,et al.  Covariance Kernels from Bayesian Generative Models , 2001, NIPS.

[15]  Shie Mannor,et al.  Reinforcement learning with Gaussian processes , 2005, ICML.

[16]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[17]  Daniel Polani,et al.  Least Squares SVM for Least Squares TD Learning , 2006, ECAI.