Experiments with Infinite-Horizon, Policy-Gradient Estimation
暂无分享,去创建一个
[1] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[2] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[3] Peter W. Glynn,et al. Stochastic approximation for Monte Carlo optimization , 1986, WSC '86.
[4] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[5] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.
[6] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.
[7] Shigenobu Kobayashi,et al. Reinforcement Learning by Stochastic Hill Climbing on Discounted Reward , 1995, ICML.
[8] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.
[9] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[10] Dimitri P. Bertsekas,et al. Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.
[11] Shigenobu Kobayashi,et al. Reinforcement Learning in POMDPs with Function Approximation , 1997, ICML.
[12] Xi-Ren Cao,et al. Perturbation realization, potentials, and sensitivity analysis of Markov processes , 1997, IEEE Trans. Autom. Control..
[13] Xi-Ren Cao,et al. Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization , 1998, IEEE Trans. Control. Syst. Technol..
[14] P. Marbach. Simulation-Based Methods for Markov Decision Processes , 1998 .
[15] Shigenobu Kobayashi,et al. An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function , 1998, ICML.
[16] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.
[17] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[18] Reuven Y. Rubinstein,et al. Modern simulation and modeling , 1998 .
[19] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).
[20] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[21] Terrence L. Fine,et al. Feedforward Neural Network Methodology , 1999, Information Science and Statistics.
[22] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[23] Jonathan Baxter,et al. Reinforcement Learning From State and Temporal Differences , 1999 .
[24] Arthur L. Samuel,et al. Some studies in machine learning using the game of checkers , 2000, IBM J. Res. Dev..
[25] Peter L. Bartlett,et al. Reinforcement Learning in POMDP's via Direct Gradient Ascent , 2000, ICML.
[26] P. Bartlett,et al. Stochastic optimization of controlled partially observable Markov decision processes , 2000, Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187).
[27] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[28] Lex Weaver,et al. A Multi-Agent Policy-Gradient Approach to Network Routing , 2001, ICML.
[29] Peter L. Bartlett,et al. Estimation and Approximation Bounds for Gradient-Based Reinforcement Learning , 2000, J. Comput. Syst. Sci..