论文信息 - Application of reinforcement learning to balancing of Acrobot

Application of reinforcement learning to balancing of Acrobot

The Acrobot is a two-link robot, actuated only at the joint between the two links. It is one of difficult tasks in reinforcement learning (RL) to control the Acrobot because it has nonlinear dynamics and continuous state and action spaces. In this article, we discuss applying the RL to the task of balancing control of the Acrobot. Our RL method has an architecture similar to the actor-critic. The actor and the critic are approximated by normalized Gaussian networks, which are trained by an online EM algorithm. We also introduce eligibility traces for our actor-critic architecture. Our computer simulation shows that our method is able to achieve fairly good control with a small number of trials.

[1] TesauroGerald. Practical Issues in Temporal Difference Learning , 1992 .

[2] R. Murray,et al. Nonlinear controllers for non-integrable systems: the Acrobot example , 1990, 1990 American Control Conference.

[3] Shin Ishii,et al. On-line EM Algorithm for the Normalized Gaussian Network , 2000, Neural Computation.

[4] Geoffrey E. Hinton,et al. An Alternative Model for Mixtures of Experts , 1994, NIPS.

[5] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[6] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[7] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8] Gerald Tesauro,et al. Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..

[9] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[10] Shin Ishii,et al. Reinforcement Learning Based on On-Line EM Algorithm , 1998, NIPS.

[11] John Moody,et al. Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.