论文信息 - Reinforcement learning with Gaussian processes

Reinforcement learning with Gaussian processes

Gaussian Process Temporal Difference (GPTD) learning offers a Bayesian solution to the policy evaluation problem of reinforcement learning. In this paper we extend the GPTD framework by addressing two pressing issues, which were not adequately treated in the original GPTD paper (Engel et al., 2003). The first is the issue of stochasticity in the state transitions, and the second is concerned with action selection and policy improvement. We present a new generative model for the value function, deduced from its relation with the discounted return. We derive a corresponding on-line algorithm for learning the posterior moments of the value Gaussian process. We also present a SARSA based extension of GPTD, termed GPSARSA, that allows the selection of actions and the gradual improvement of policies without requiring a world-model.

[1] Edward J. Wegman,et al. Statistical Signal Processing , 1985 .

[2] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .

[3] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[4] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.

[5] Alexander J. Smola,et al. Learning with kernels , 1998 .

[6] Carl E. Rasmussen,et al. Gaussian Processes in Reinforcement Learning , 2003, NIPS.

[7] Shie Mannor,et al. Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning , 2003, ICML.

[8] Shie Mannor,et al. Action Elimination and Stopping Conditions for Reinforcement Learning , 2003, ICML.

[9] John N. Tsitsiklis,et al. Bias and variance in value function estimation , 2004, ICML.

[10] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11] Yaakov Engel,et al. Algorithms and representations for reinforcement learning (עם תקציר בעברית, תכן ושער נוסף: אלגוריתמים וייצוגים ללמידה מחיזוקים.; אלגוריתמים וייצוגים ללמידה מחיזוקים.) , 2005 .