论文信息 - Reinforcement Learning in Continuous Action Spaces

Reinforcement Learning in Continuous Action Spaces

Quite some research has been done on reinforcement learning in continuous environments, but the research on problems where the actions can also be chosen from a continuous space is much more limited. We present a new class of algorithms named continuous actor critic learning automaton (CACLA) that can handle continuous states and actions. The resulting algorithm is straightforward to implement. An experimental comparison is made between this algorithm and other algorithms that can handle continuous action spaces. These experiments show that CACLA performs much better than the other algorithms, especially when it is combined with a Gaussian exploration method

M.A. Wiering | H. van Hasselt | M. Wiering | Hado Philip van Hasselt | H. van Hasselt

[1] Chris Watkins,et al. Learning from delayed rewards , 1989 .

[2] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[3] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.

[4] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[5] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[6] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .

[7] Leemon C Baird,et al. Reinforcement Learning With High-Dimensional, Continuous Actions , 1993 .

[8] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[9] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.

[10] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[11] Peter Dayan,et al. The convergence of TD(λ) for general λ , 1992, Machine Learning.