A comparison of direct and model-based reinforcement learning
暂无分享,去创建一个
[1] K. J. Craik,et al. The nature of explanation , 1944 .
[2] Robert E. Larson,et al. State increment dynamic programming , 1968 .
[3] Michael I. Jordan,et al. Learning to Control an Unstable System with Forward Modeling , 1989, NIPS.
[4] Vijaykumar Gullapalli,et al. A stochastic reinforcement learning algorithm for learning real-valued functions , 1990, Neural Networks.
[5] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[6] Richard S. Sutton,et al. Reinforcement Learning is Direct Adaptive Optimal Control , 1992, 1991 American Control Conference.
[7] Peter J. Millington,et al. Associative reinforcement learning for optimal control , 1991 .
[8] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.
[9] Richard S. Sutton,et al. Planning by Incremental Dynamic Programming , 1991, ML.
[10] Steven J. Bradtke,et al. Reinforcement Learning Applied to Linear Quadratic Regulation , 1992, NIPS.
[11] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[12] C. Atkeson,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[13] Christopher G. Atkeson,et al. Using Local Trajectory Optimizers to Speed Up Global Optimization in Dynamic Programming , 1993, NIPS.
[14] Andrew W. Moore,et al. The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.
[15] Leemon C Baird,et al. Reinforcement Learning With High-Dimensional, Continuous Actions , 1993 .
[16] R. E. Eckmiller,et al. To swing up an inverted Pendulum using stochastic real-valued Reinforcement Learning , 1994 .
[17] Andrew G. Barto,et al. Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.
[18] Claude-Nicolas Fiechter,et al. Efficient reinforcement learning , 1994, COLT '94.
[19] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[20] Mark W. Spong,et al. The swing up control problem for the Acrobot , 1995 .
[21] Kenji Doya,et al. Temporal Difference Learning in Continuous Time and Space , 1995, NIPS.
[22] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[23] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
[24] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.
[25] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.
[26] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[27] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[28] John Rust. Numerical dynamic programming in economics , 1996 .
[29] M. Hasselmo,et al. Temporal Diierence Learning in Continuous Time and Space , 1996 .
[30] H. Bersini,et al. Three connectionist implementations of dynamic programming for optimal control: a preliminary comparative analysis , 1996, Proceedings of International Workshop on Neural Networks for Identification, Control, Robotics and Signal/Image Processing.
[31] Gary Boone,et al. Efficient reinforcement learning: model-based Acrobot control , 1997, Proceedings of International Conference on Robotics and Automation.
[32] Ashwin Ram,et al. Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..
[33] Stefan Schaal,et al. Learning tasks from a single demonstration , 1997, Proceedings of International Conference on Robotics and Automation.
[34] Edward Grant,et al. Learning Control , 1993, Encyclopedia of Machine Learning.