论文信息 - Reinforcement learning for continuous action using stochastic gradient ascent

Reinforcement learning for continuous action using stochastic gradient ascent

This paper considers a reinforcement learning (RL) where the set of possible action is continuous and reward is considerably delayed. The proposed method is based on a stochastic gradient ascent with respect to the policy parameter space; it does not require a model of the environment to be given or learned, it does not need to approximate the value function explicitly, and it is incremental, requiring only a constant amount of computation per step. We demonstrate the behavior through a simple linear regulator problem and a cart-pole control problem.

Shigenobu Kobayashi | Hajime Kimura | H. Kimura | Shigenobu Kobayashi | S. Kobayashi

[1] Steven J. Bradtke,et al. Reinforcement Learning Applied to Linear Quadratic Regulation , 1992, NIPS.

[2] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[3] Paul E. Utgoff,et al. A Teaching Method for Reinforcement Learning , 1992, ML.

[4] L. C. Baird,et al. Reinforcement learning in continuous time: advantage updating , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[5] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[6] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.

[7] Andrew G. Barto,et al. An Actor/Critic Algorithm that is Equivalent to Q-Learning , 1994, NIPS.

[8] Shigenobu Kobayashi,et al. Reinforcement Learning by Stochastic Hill Climbing on Discounted Reward , 1995, ICML.

[9] Chin-Teng Lin,et al. Reinforcement learning for an ART-based fuzzy adaptive learning control network , 1996, IEEE Trans. Neural Networks.

[10] Kenji Doya,et al. Efficient Nonlinear Control with Actor-Tutor Architecture , 1996, NIPS.

[11] Shigenobu Kobayashi,et al. Reinforcement Learning in POMDPs with Function Approximation , 1997, ICML.