论文信息 - Reinforcement learning in a nutshell

Reinforcement learning in a nutshell

We provide a concise introduction to basic approaches to reinforcement learning from the machine learning perspective. The focus is on value function and policy gradient methods. Some selected recent trends are highlighted.

[1] Artur Merke,et al. A Necessary Condition of Convergence for Reinforcement Learning with Function Approximation , 2002, ICML.

[2] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.

[3] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[4] Terrence J. Sejnowski,et al. TD(λ) Converges with Probability 1 , 1994, Machine Learning.

[5] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[6] Saori C. Tanaka,et al. Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops , 2004, Nature Neuroscience.

[7] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[8] John J. Grefenstette,et al. Evolutionary Algorithms for Reinforcement Learning , 1999, J. Artif. Intell. Res..

[9] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.

[10] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[11] Jeff G. Schneider,et al. Covariant Policy Search , 2003, IJCAI.

[12] Michael L. Littman,et al. Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[13] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[14] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[15] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[16] Michael L. Littman,et al. Memoryless policies: theoretical limitations and practical results , 1994 .

[17] Kenji Doya,et al. Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[18] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[19] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[20] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[21] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.

[22] Jesse Hoey,et al. An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.

[23] S.M. Lucas,et al. Evolutionary computation and games , 2006, IEEE Computational Intelligence Magazine.

[24] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[25] Michael P. Wellman,et al. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[26] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[27] Xi-Ren Cao,et al. A basic formula for online policy gradient algorithms , 2005, IEEE Transactions on Automatic Control.

[28] P. Dayan,et al. Dopamine, uncertainty and TD learning , 2005, Behavioral and Brain Functions.

[29] Leslie Pack Kaelbling,et al. Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[30] Stefan Schaal,et al. Reinforcement Learning for Humanoid Robotics , 2003 .

[31] Jan Wessnitzer,et al. ESANN'2007 proceedings - European Symposium on Artificial Neural Networks , 2007 .

[32] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[33] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[34] Ian H. Witten,et al. An Adaptive Optimal Controller for Discrete-Time Markov Environments , 1977, Inf. Control..

[35] David B. Fogel,et al. Evolution, neural networks, games, and intelligence , 1999, Proc. IEEE.

[36] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[37] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[38] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[39] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[40] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[41] Jeff G. Schneider,et al. Autonomous helicopter control using reinforcement learning policy search methods , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[42] Tao Wang,et al. Bayesian sparse sampling for on-line reward optimization , 2005, ICML.

[43] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.