Application of sequential reinforcement learning to control dynamic systems

The article describes the structure of a neural reinforcement learning controller, based on the approach of asynchronous dynamic programming. The learning controller is applied to a well-known benchmark problem, the cart-pole system. In crucial difference to previous approaches, the goal of learning is not only to avoid failure, but moreover to stabilize the cart in the middle of the track, with the pole standing in an upright position. The aim is to learn high quality control trajectories known from conventional controller design, by providing only a minimum amount of a priori knowledge and teaching information.

[1]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[2]  J. Sitte,et al.  A cartpole experiment benchmark for trainable controllers , 1993, IEEE Control Systems.

[3]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[4]  R. Riedmiller,et al.  Aspects of learning neural control , 1994, Proceedings of IEEE International Conference on Systems, Man and Cybernetics.

[5]  Martin A. Riedmiller,et al.  Advanced supervised learning in multi-layer perceptrons — From backpropagation to adaptive learning algorithms , 1994 .

[6]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..