Reinforcement learning in a nutshell

We provide a concise introduction to basic approaches to reinforcement learning from the machine learning perspective. The focus is on value function and policy gradient methods. Some selected recent trends are highlighted.

[1]  Artur Merke,et al.  A Necessary Condition of Convergence for Reinforcement Learning with Function Approximation , 2002, ICML.

[2]  Justin A. Boyan,et al.  Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.

[3]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[4]  Terrence J. Sejnowski,et al.  TD(λ) Converges with Probability 1 , 1994, Machine Learning.

[5]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[6]  Saori C. Tanaka,et al.  Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops , 2004, Nature Neuroscience.

[7]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[8]  John J. Grefenstette,et al.  Evolutionary Algorithms for Reinforcement Learning , 1999, J. Artif. Intell. Res..

[9]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[10]  John N. Tsitsiklis,et al.  Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[11]  Jeff G. Schneider,et al.  Covariant Policy Search , 2003, IJCAI.

[12]  Michael L. Littman,et al.  Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[13]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[14]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[15]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[16]  Michael L. Littman,et al.  Memoryless policies: theoretical limitations and practical results , 1994 .

[17]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[18]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[19]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[20]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[21]  Shie Mannor,et al.  Reinforcement learning with Gaussian processes , 2005, ICML.

[22]  Jesse Hoey,et al.  An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.

[23]  S.M. Lucas,et al.  Evolutionary computation and games , 2006, IEEE Computational Intelligence Magazine.

[24]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[25]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[26]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[27]  Xi-Ren Cao,et al.  A basic formula for online policy gradient algorithms , 2005, IEEE Transactions on Automatic Control.

[28]  P. Dayan,et al.  Dopamine, uncertainty and TD learning , 2005, Behavioral and Brain Functions.

[29]  Leslie Pack Kaelbling,et al.  Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[30]  Stefan Schaal,et al.  Reinforcement Learning for Humanoid Robotics , 2003 .

[31]  Jan Wessnitzer,et al.  ESANN'2007 proceedings - European Symposium on Artificial Neural Networks , 2007 .

[32]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[33]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[34]  Ian H. Witten,et al.  An Adaptive Optimal Controller for Discrete-Time Markov Environments , 1977, Inf. Control..

[35]  David B. Fogel,et al.  Evolution, neural networks, games, and intelligence , 1999, Proc. IEEE.

[36]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[37]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[38]  Steven J. Bradtke,et al.  Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[39]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[40]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[41]  Jeff G. Schneider,et al.  Autonomous helicopter control using reinforcement learning policy search methods , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[42]  Tao Wang,et al.  Bayesian sparse sampling for on-line reward optimization , 2005, ICML.

[43]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.