Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping
暂无分享,去创建一个
Alborz Geramifard | Michael H. Bowling | Richard S. Sutton | Csaba Szepesvári | R. Sutton | Csaba Szepesvari | Michael Bowling | A. Geramifard
[1] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[2] Satinder P. Singh,et al. Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.
[3] C. Atkeson,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[4] Christopher G. Atkeson,et al. Using Local Trajectory Optimizers to Speed Up Global Optimization in Dynamic Programming , 1993, NIPS.
[5] Jing Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..
[6] J. Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, IEEE International Conference on Neural Networks.
[7] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.
[8] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[9] Richard S. Sutton,et al. Model-Based Reinforcement Learning with an Approximate, Learned Model , 1996 .
[10] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[11] B. Delyon. General results on the convergence of stochastic algorithms , 1996, IEEE Trans. Autom. Control..
[12] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[13] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[14] Craig Boutilier,et al. Stochastic dynamic programming with factored representations , 2000, Artif. Intell..
[15] Jonathan Schaeffer,et al. Temporal Difference Learning Applied to a High-Performance Game-Playing Program , 2001, IJCAI.
[16] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[17] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[18] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[19] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[20] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[21] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[22] Geoffrey J. Gordon,et al. Fast Exact Planning in Markov Decision Processes , 2005, ICAPS.
[23] Kevin D. Seppi,et al. Prioritization Methods for Accelerating MDP Solvers , 2005, J. Mach. Learn. Res..
[24] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[25] Olivier Sigaud,et al. Learning the structure of Factored Markov Decision Processes in reinforcement learning problems , 2006, ICML.
[26] Alborz Geramifard,et al. Incremental Least-Squares Temporal Difference Learning , 2006, AAAI.
[27] Richard S. Sutton,et al. Reinforcement Learning of Local Shape in the Game of Go , 2007, IJCAI.
[28] Alborz Geramifard,et al. Sigma point policy iteration , 2008, AAMAS.