Q-learning and enhanced policy iteration in discounted dynamic programming
暂无分享,去创建一个
[1] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[2] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Vol. II , 1976 .
[3] Richard E. Rosenthal,et al. Stochastic Dynamic Location Analysis , 1978 .
[4] Dimitri P. Bertsekas,et al. Distributed asynchronous computation of fixed points , 1983, Math. Program..
[5] John N. Tsitsiklis,et al. Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms , 1984, 1984 American Control Conference.
[6] C. Watkins. Learning from delayed rewards , 1989 .
[7] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[8] John N. Tsitsiklis,et al. Asynchronous stochastic approximation and Q-learning , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.
[9] Ronald J. Williams,et al. Analysis of Some Incremental Variants of Policy Iteration: First Steps Toward Understanding Actor-Cr , 1993 .
[10] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[11] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[12] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[13] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[14] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[15] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[16] S. Ioffe,et al. Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming , 1996 .
[17] Dimitri P. Bertsekas,et al. Temporal Dierences-Based Policy Iteration and Applications in Neuro-Dynamic Programming 1 , 1997 .
[18] Vivek S. Borkar,et al. Stochastic Approximation for Nonexpansive Maps: Application to Q-Learning Algorithms , 1997, SIAM J. Control. Optim..
[19] V. Borkar. Asynchronous Stochastic Approximations , 1998 .
[20] John N. Tsitsiklis,et al. Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives , 1999, IEEE Trans. Autom. Control..
[21] John S. Baras,et al. A learning algorithm for Markov decision processes with adaptive state aggregation , 2000, Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187).
[22] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[23] Ying He,et al. Simulation-Based Algorithms for Markov Decision Processes , 2002 .
[24] John N. Tsitsiklis,et al. On the Convergence of Optimistic Policy Iteration , 2002, J. Mach. Learn. Res..
[25] Abhijit Gosavi,et al. Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , 2003 .
[26] Abhijit Gosavi,et al. Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , 2003 .
[27] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[28] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[29] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[30] Shie Mannor,et al. Basis Function Adaptation in Temporal Difference Reinforcement Learning , 2005, Ann. Oper. Res..
[31] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[32] David Choi,et al. A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning , 2001, Discret. Event Dyn. Syst..
[33] Sean P. Meyn. Control Techniques for Complex Networks: Workload , 2007 .
[34] Xi-Ren Cao,et al. Stochastic learning and optimization - A sensitivity-based approach , 2007, Annu. Rev. Control..
[35] Jiaqiao Hu,et al. Simulation-based Algorithms for Markov Decision Processes (Communications and Control Engineering) , 2007 .
[36] T. Jung,et al. Kernelizing LSPE(λ) , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[37] Xi-Ren Cao,et al. Stochastic Learning and Optimization: A Sensitivity-Based Approach (International Series on Discrete Event Dynamic Systems) , 2007 .
[38] Warren B. Powell,et al. Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .
[39] D. Bertsekas,et al. A Least Squares Q-Learning Algorithm for Optimal Stopping Problems , 2007 .
[40] Warren B. Powell,et al. Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics) , 2007 .
[41] D. Bertsekas,et al. Q-learning algorithms for optimal stopping based on least squares , 2007, 2007 European Control Conference (ECC).
[42] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.
[43] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[44] Shalabh Bhatnagar,et al. New algorithms of the Q-learning type , 2008, Autom..
[45] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008, Texts and Readings in Mathematics.
[46] Panos M. Pardalos,et al. Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..
[47] Dimitri P. Bertsekas,et al. Basis function adaptation methods for cost approximation in MDP , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[48] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[49] Xi-Ren Cao,et al. Stochastic learning and optimization - A sensitivity-based approach , 2007, Annual Reviews in Control.
[50] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[51] D. Bertsekas,et al. Journal of Computational and Applied Mathematics Projected Equation Methods for Approximate Solution of Large Linear Systems , 2022 .
[52] Dimitri P. Bertsekas,et al. Distributed asynchronous policy iteration in dynamic programming , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[53] Dimitri P. Bertsekas,et al. Q-learning and enhanced policy iteration in discounted dynamic programming , 2010, CDC.
[54] Bart De Schutter,et al. Online least-squares policy iteration for reinforcement learning control , 2010, Proceedings of the 2010 American Control Conference.
[55] Huizhen Yu,et al. Convergence of Least Squares Temporal Difference Methods Under General Conditions , 2010, ICML.
[56] Bart De Schutter,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .
[57] Dimitri P. Bertsekas,et al. Approximate Dynamic Programming , 2017, Encyclopedia of Machine Learning and Data Mining.
[58] Benjamin Van Roy. On Regression-Based Stopping Times , 2010, Discret. Event Dyn. Syst..
[59] Dimitri P. Bertsekasy. Williams-Baird Counterexample for Q-Factor Asynchronous Policy Iteration , 2010 .
[60] D. Bertsekas. Approximate policy iteration: a survey and some new methods , 2011 .
[61] Huizhen Yu,et al. Least Squares Temporal Difference Methods: An Analysis under General Conditions , 2012, SIAM J. Control. Optim..
[62] Richard S. Sutton,et al. Reinforcement Learning , 1992, Handbook of Machine Learning.
[63] J. Walrand,et al. Distributed Dynamic Programming , 2022 .