论文信息 - Batch Reinforcement Learning

Batch Reinforcement Learning

Batch reinforcement learning is a subfield of dynamic programming-based reinforcement learning. Originally defined as the task of learning the best possible policy from a fixed set of a priori-known transition samples, the (batch) algorithms developed in this field can be easily adapted to the classical online case, where the agent interacts with the environment while learning. Due to the efficient use of collected data and the stability of the learning process, this research area has attracted a lot of attention recently. In this chapter, we introduce the basic principles and the theory behind batch reinforcement learning, describe the most important algorithms, exemplarily discuss ongoing research within this field, and briefly survey real-world applications of batch reinforcement learning.

[1] P. Werbos,et al. Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[2] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.

[3] Martin A. Riedmiller,et al. A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[4] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[5] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.

[6] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[7] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[8] Leslie Pack Kaelbling,et al. Recent Advances in Reinforcement Learning , 1996, Springer US.

[9] Andrew G. Barto,et al. Reinforcement learning , 1998 .

[10] Neil Immerman,et al. The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[11] Peter W. Glynn,et al. Kernel-Based Reinforcement Learning in Average-Cost Problems: An Application to Optimal Portfolio Choice , 2000, NIPS.

[12] Martin Lauer,et al. An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[13] Pat Langley,et al. Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), Stanford University, Stanford, CA, USA, June 29 - July 2, 2000 , 2000, ICML 2000.

[14] Michail G. Lagoudakis,et al. Model-Free Least-Squares Policy Iteration , 2001, NIPS.

[15] Artur Merke,et al. Convergent Combinations of Reinforcement Learning with Linear Function Approximation , 2002, NIPS.

[16] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[17] Longxin Lin. Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.

[18] D. Ernst,et al. Approximate Value Iteration in the Reinforcement Learning Context. Application to Electrical Power System Control. , 2005 .

[19] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[20] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[21] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[22] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.

[23] Liming Xiang,et al. Kernel-Based Reinforcement Learning , 2006, ICIC.

[24] Peter Brucker,et al. Complex Scheduling , 2006 .

[25] Peter Stone,et al. Batch reinforcement learning in a complex domain , 2007, AAMAS '07.

[26] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.

[27] Martin A. Riedmiller,et al. Learning to Drive a Real Car in 20 Minutes , 2007, 2007 Frontiers in the Convergence of Bioscience and Information Technologies.

[28] S. Timmer,et al. Fitted Q Iteration with CMACs , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[29] Joelle Pineau,et al. Adaptive Treatment of Epilepsy via Batch-mode Reinforcement Learning , 2008, AAAI.

[30] Martin A. Riedmiller,et al. Evaluation of Batch-Mode Reinforcement Learning Methods for Solving DEC-MDPs with Changing Action Sets , 2008, EWRL.

[31] Martin A. Riedmiller,et al. ADAPTIVE REACTIVE JOB-SHOP SCHEDULING WITH REINFORCEMENT LEARNING AGENTS , 2008 .

[32] Martin Lauer,et al. Learning to dribble on a real robot by success and failure , 2008, 2008 IEEE International Conference on Robotics and Automation.

[33] Geoffrey E. Hinton. Reducing the Dimensionality of Data with Neural , 2008 .

[34] Andrea Bonarini,et al. Batch Reinforcement Learning for Controlling a Mobile Wheeled Pendulum Robot , 2008, IFIP AI.

[35] Martin A. Riedmiller,et al. Reinforcement learning for DEC-MDPs with changing action sets and partially ordered dependencies , 2008, AAMAS.

[36] Carl E. Rasmussen,et al. Gaussian process dynamic programming , 2009, Neurocomputing.

[37] Sergio M. Savaresi,et al. Batch Reinforcement Learning for semi-active suspension control , 2009, 2009 IEEE Control Applications, (CCA) & Intelligent Control, (ISIC).

[38] Martin A. Riedmiller,et al. The Neuro Slot Car Racer: Reinforcement Learning in a Real World Setting , 2009, 2009 International Conference on Machine Learning and Applications.

[39] Martin A. Riedmiller,et al. Reinforcement learning for robot soccer , 2009, Auton. Robots.

[40] Louis Wehenkel,et al. Reinforcement Learning Versus Model Predictive Control: A Comparison on a Power System Problem , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[41] Martin A. Riedmiller,et al. Deep auto-encoder neural networks in reinforcement learning , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[42] Martin A. Riedmiller,et al. Deep learning of visual control policies , 2010, ESANN.

[43] Sascha Lange,et al. Tiefes Reinforcement-Lernen auf Basis visueller Wahrnehmungen , 2010 .

[44] Martin A. Riedmiller,et al. Reinforcement learning in feedback control , 2011, Machine Learning.