Natural actor-critic algorithms
暂无分享,去创建一个
Shalabh Bhatnagar | Mark Lee | Richard S. Sutton | Mohammad Ghavamzadeh | R. Sutton | S. Bhatnagar | M. Ghavamzadeh | Mark Lee
[1] John Rust. Numerical dynamic programming in economics , 1996 .
[2] Peter Dayan,et al. Analytical Mean Squared Error Curves for Temporal Difference Learning , 1996, Machine Learning.
[3] Peter W. Glynn,et al. Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.
[4] Shalabh Bhatnagar,et al. Adaptive Newton-based multivariate smoothed functional algorithms for simulation optimization , 2007, TOMC.
[5] V. Borkar. Recursive self-tuning control of finite Markov chains , 1997 .
[6] Michael I. Jordan,et al. On the Convergence of Temporal-Difference Learning with Linear Function Approximation , 2001 .
[7] Morris W. Hirsch,et al. Convergent activation dynamics in continuous time networks , 1989, Neural Networks.
[8] Stefan Schaal,et al. Reinforcement Learning for Humanoid Robotics , 2003 .
[9] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[10] S. Andradóttir,et al. A Simulated Annealing Algorithm with Constant Temperature for Discrete Stochastic Optimization , 1999 .
[11] Solomon Lefschetz,et al. Stability by Liapunov's Direct Method With Applications , 1962 .
[12] Shalabh Bhatnagar,et al. Reinforcement Learning Based Algorithms for Average Cost Markov Decision Processes , 2007, Discret. Event Dyn. Syst..
[13] Shalabh Bhatnagar,et al. Natural actorcritic algorithms. , 2009 .
[14] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[15] Jeff G. Schneider,et al. Covariant Policy Search , 2003, IJCAI.
[16] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[17] Samy Bengio,et al. Variance Reduction Techniques in . . . , 2003 .
[18] Odile Brandière,et al. Some Pathological Traps for Stochastic Approximation , 1998 .
[19] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[20] Jonathan Baxter. KnightCap : A chess program that learns by combining TD ( ) with game-tree search , 1998 .
[21] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[22] Vivek S. Borkar,et al. Learning Algorithms for Markov Decision Processes with Average Cost , 2001, SIAM J. Control. Optim..
[23] Vladislav Tadic,et al. On the Convergence of Temporal-Difference Learning with Linear Function Approximation , 2001, Machine Learning.
[24] A. Barto,et al. Improved Temporal Difference Methods with Linear Function Approximation , 2004 .
[25] Shalabh Bhatnagar,et al. A simultaneous perturbation stochastic approximation-based actor-critic algorithm for Markov decision processes , 2004, IEEE Transactions on Automatic Control.
[26] J. Spall. STOCHASTIC OPTIMIZATION , 2002 .
[27] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[28] Xi-Ren Cao,et al. Perturbation realization, potentials, and sensitivity analysis of Markov processes , 1997, IEEE Trans. Autom. Control..
[29] Vivek S. Borkar,et al. Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..
[30] Dirk Henkemans,et al. C++ programming for the absolute beginner , 2001 .
[31] Robert M. Glorioso,et al. Engineering Cybernetics , 1975 .
[32] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[33] Benjamin Van Roy,et al. Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
[34] Peter Stone,et al. Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.
[35] John N. Tsitsiklis,et al. Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
[36] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[37] Harold J. Kushner,et al. Stochastic Approximation Algorithms and Applications , 1997, Applications of Mathematics.
[38] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 2001, IEEE Trans. Autom. Control..
[39] D. J. White,et al. A Survey of Applications of Markov Decision Processes , 1993 .
[40] Sridhar Mahadevan,et al. Hierarchical Policy Gradient Algorithms , 2003, ICML.
[41] R. Bellman,et al. FUNCTIONAL APPROXIMATIONS AND DYNAMIC PROGRAMMING , 1959 .
[42] Abraham Thomas,et al. LEARNING ALGORITHMS FOR MARKOV DECISION PROCESSES , 2009 .
[43] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[44] Shie Mannor,et al. Regularized Policy Iteration , 2008, NIPS.
[45] Andrew Tridgell,et al. KnightCap: A Chess Programm That Learns by Combining TD(lambda) with Game-Tree Search , 1998, ICML.
[46] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[47] Shalabh Bhatnagar,et al. Adaptive multivariate three-timescale stochastic approximation algorithms for simulation based optimization , 2005, TOMC.
[48] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[49] K. I. M. McKinnon,et al. On the Generation of Markov Decision Processes , 1995 .
[50] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[51] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[52] BhatnagarShalabh,et al. Natural actor-critic algorithms , 2009 .
[53] Odile Brandi Ere. SOME PATHOLOGICAL TRAPS FOR STOCHASTIC APPROXIMATION , 1998 .
[54] Peter L. Bartlett,et al. Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[55] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[56] Vivek S. Borkar,et al. Reinforcement Learning — A Bridge Between Numerical Methods and Monte Carlo , 2009 .
[57] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[58] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.
[59] Harold J. Kushner,et al. wchastic. approximation methods for constrained and unconstrained systems , 1978 .
[60] Dimitri P. Bertsekas,et al. Nonlinear Programming , 1995 .
[61] D. Rogers,et al. Variance-Reduction Techniques , 1988 .
[62] Andrew G. Barto,et al. Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.
[63] Shalabh Bhatnagar,et al. Incremental Natural Actor-Critic Algorithms , 2007, NIPS.
[64] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[65] M. Narasimha Murty,et al. Information theoretic justification of Boltzmann selection and its generalization to Tsallis case , 2005, 2005 IEEE Congress on Evolutionary Computation.
[66] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[67] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[68] Ben Tse,et al. Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.
[69] Sean P. Meyn. Control Techniques for Complex Networks: Workload , 2007 .
[70] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[71] Vijay R. Konda,et al. OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..
[72] James W. Daniel,et al. Splines and efficiency in dynamic programming , 1976 .
[73] Thomas Hofmann,et al. Natural Actor-Critic for Road Traffic Optimisation , 2007 .
[74] Andrew G. Barto,et al. Linear Least-Squares Algorithms for Temporal Difference Learning , 2005, Machine Learning.
[75] M. Kurano. LEARNING ALGORITHMS FOR MARKOV DECISION PROCESSES , 1987 .
[76] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[77] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[78] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[79] S. Thomas Alexander,et al. Adaptive Signal Processing , 1986, Texts and Monographs in Computer Science.
[80] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[81] John N. Tsitsiklis,et al. Asynchronous stochastic approximation and Q-learning , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.
[82] R. Pemantle,et al. Nonconvergence to Unstable Points in Urn Models and Stochastic Approximations , 1990 .
[83] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[84] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.
[85] Mohammad Ghavamzadeh,et al. Bayesian actor-critic algorithms , 2007, ICML '07.
[86] Mohammad Ghavamzadeh,et al. Bayesian Policy Gradient Algorithms , 2006, NIPS.
[87] V. Borkar. Stochastic approximation with two time scales , 1997 .
[88] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[89] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[90] J. Tsitsiklis,et al. An optimal one-way multigrid algorithm for discrete-time stochastic control , 1991 .