Approximate policy iteration: a survey and some new methods
暂无分享,去创建一个
[1] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.
[2] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[3] E. Denardo. CONTRACTION MAPPINGS IN THE THEORY UNDERLYING DYNAMIC PROGRAMMING , 1967 .
[4] A. L. Samuel,et al. Some studies in machine learning using the game of checkers. II: recent progress , 1967 .
[5] B. Martinet,et al. R'egularisation d''in'equations variationnelles par approximations successives , 1970 .
[6] B. Martinet. Brève communication. Régularisation d'inéquations variationnelles par approximations successives , 1970 .
[7] M. A. Krasnoselʹskii. Approximate Solution of Operator Equations , 1972 .
[8] R. Rockafellar. Monotone Operators and the Proximal Point Algorithm , 1976 .
[9] D. Bertsekas. Monotone Mappings with Application in Dynamic Programming , 1977 .
[10] Reuven Y. Rubinstein,et al. Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.
[11] Dirk P. Kroese,et al. Simulation and the Monte Carlo Method (Wiley Series in Probability and Statistics) , 1981 .
[12] Dimitri Bertsekas,et al. Distributed dynamic programming , 1981, 1981 20th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.
[13] D. Bertsekas,et al. Projection methods for variational inequalities with application to the traffic assignment problem , 1982 .
[14] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[15] C. Fletcher. Computational Galerkin Methods , 1983 .
[16] Max Donath,et al. American Control Conference , 1993 .
[17] Ronald J. Williams,et al. Analysis of Some Incremental Variants of Policy Iteration: First Steps Toward Understanding Actor-Cr , 1993 .
[18] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[19] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[20] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.
[21] A. Harry Klopf,et al. Advantage Updating Applied to a Differrential Game , 1994, NIPS.
[22] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.
[23] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[24] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[25] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[26] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[27] S. Ioffe,et al. Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming , 1996 .
[28] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[29] Fernando J. Pineda,et al. Mean-Field Theory for Batched TD() , 1997, Neural Computation.
[30] Dimitri P. Bertsekas,et al. Temporal Dierences-Based Policy Iteration and Applications in Neuro-Dynamic Programming 1 , 1997 .
[31] Guanrong Chen,et al. Approximate Solutions of Operator Equations , 1997 .
[32] L. Trefethen,et al. Numerical linear algebra , 1997 .
[33] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[34] John N. Tsitsiklis,et al. Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
[35] John N. Tsitsiklis,et al. Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives , 1999, IEEE Trans. Autom. Control..
[36] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[37] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[38] Jun S. Liu,et al. Monte Carlo strategies in scientific computing , 2001 .
[39] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[40] Dimitri P. Bertsekas,et al. Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..
[41] Abhijit Gosavi,et al. Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , 2003 .
[42] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[43] Abhijit Gosavi,et al. Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , 2003 .
[44] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[45] Gerald Tesauro,et al. Practical issues in temporal difference learning , 1992, Machine Learning.
[46] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[47] Jennie Si,et al. Handbook of Learning and Approximate Dynamic Programming (IEEE Press Series on Computational Intelligence) , 2004 .
[48] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[49] A. Barto,et al. Improved Temporal Difference Methods with Linear Function Approximation , 2004 .
[50] Shie Mannor,et al. Basis Function Adaptation in Temporal Difference Reinforcement Learning , 2005, Ann. Oper. Res..
[51] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[52] Shie Mannor,et al. A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..
[53] András Lörincz,et al. Learning Tetris Using the Noisy Cross-Entropy Method , 2006, Neural Computation.
[54] David Choi,et al. A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning , 2001, Discret. Event Dyn. Syst..
[55] G. Calafiore,et al. Probabilistic and Randomized Methods for Design under Uncertainty , 2006 .
[56] Benjamin Van Roy,et al. Tetris: A Study of Randomized Constraint Sampling , 2006 .
[57] Benjamin Van Roy. Performance Loss Bounds for Approximate Value Iteration with State Aggregation , 2006, Math. Oper. Res..
[58] Sean P. Meyn. Control Techniques for Complex Networks: Workload , 2007 .
[59] Xi-Ren Cao,et al. Stochastic learning and optimization - A sensitivity-based approach , 2007, Annu. Rev. Control..
[60] D. Bertsekas,et al. Solution of Large Systems of Equations Using Approximate Dynamic Programming Methods , 2007 .
[61] Dimitri P. Bertsekas,et al. Stochastic optimal control : the discrete time case , 2007 .
[62] T. Jung,et al. Kernelizing LSPE(λ) , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[63] Warren B. Powell,et al. Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .
[64] D. Bertsekas,et al. A Least Squares Q-Learning Algorithm for Optimal Stopping Problems , 2007 .
[65] Frank L. Lewis,et al. Guest Editorial: Special Issue on Adaptive Dynamic Programming and Reinforcement Learning in Feedback Control , 2008, IEEE Trans. Syst. Man Cybern. Part B.
[66] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.
[67] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[68] Zhi-Qiang Liu,et al. Preconditioned temporal difference learning , 2008, ICML '08.
[69] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008, Texts and Readings in Mathematics.
[70] Vivek S. Borkar,et al. Reinforcement Learning — A Bridge Between Numerical Methods and Monte Carlo , 2009 .
[71] Panos M. Pardalos,et al. Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..
[72] Dimitri P. Bertsekas,et al. Basis function adaptation methods for cost approximation in MDP , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[73] Dimitri P. Bertsekas,et al. Convergence Results for Some Temporal Difference Methods Based on Least Squares , 2009, IEEE Transactions on Automatic Control.
[74] F.L. Lewis,et al. Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.
[75] D. Bertsekas,et al. Approximate Solution of Large-Scale Linear Inverse Problems with Monte Carlo Simulation ∗ , 2009 .
[76] Vivek F. Farias,et al. A Smoothed Approximate Linear Program , 2009, NIPS.
[77] Richard W. Cottle,et al. Linear Complementarity Problem , 2009, Encyclopedia of Optimization.
[78] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[79] D. Bertsekas. Projected Equations, Variational Inequalities, and Temporal Difference Methods , 2009 .
[80] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[81] Dimitri P. Bertsekas,et al. Convex Optimization Theory , 2009 .
[82] D. Bertsekas,et al. Journal of Computational and Applied Mathematics Projected Equation Methods for Approximate Solution of Large Linear Systems , 2022 .
[83] Bruno Scherrer,et al. Improvements on Learning Tetris with Cross Entropy , 2009, J. Int. Comput. Games Assoc..
[84] Dimitri P. Bertsekas,et al. Distributed asynchronous policy iteration in dynamic programming , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[85] Bruno Scherrer,et al. Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view , 2010, ICML.
[86] Dimitri P. Bertsekas,et al. Error Bounds for Approximations from Projected Linear Equations , 2010, Math. Oper. Res..
[87] Bart De Schutter,et al. Online least-squares policy iteration for reinforcement learning control , 2010, Proceedings of the 2010 American Control Conference.
[88] B. Scherrer,et al. Least-Squares Policy Iteration: Bias-Variance Trade-off in Control Problems , 2010, ICML.
[89] Huizhen Yu,et al. Convergence of Least Squares Temporal Difference Methods Under General Conditions , 2010, ICML.
[90] Shie Mannor,et al. Adaptive Bases for Reinforcement Learning , 2010, ECML/PKDD.
[91] Dimitri P. Bertsekas,et al. Q-learning and enhanced policy iteration in discounted dynamic programming , 2010, 49th IEEE Conference on Decision and Control (CDC).
[92] Bart De Schutter,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .
[93] Dimitri P. Bertsekas,et al. Approximate Dynamic Programming , 2017, Encyclopedia of Machine Learning and Data Mining.
[94] Dimitri P. Bertsekas,et al. Pathologies of temporal difference methods in approximate dynamic programming , 2010, 49th IEEE Conference on Decision and Control (CDC).
[95] Panos M. Pardalos,et al. Convex optimization theory , 2010, Optim. Methods Softw..
[96] Csaba Szepesvári,et al. Reinforcement Learning Algorithms for MDPs , 2011 .
[97] Bart De Schutter,et al. Cross-Entropy Optimization of Control Policies With Adaptive Basis Functions , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).
[98] Huizhen Yu,et al. Least Squares Temporal Difference Methods: An Analysis under General Conditions , 2012, SIAM J. Control. Optim..
[99] Steven I. Marcus,et al. Simulation-based Algorithms for Markov Decision Processes/ Hyeong Soo Chang ... [et al.] , 2013 .
[100] J. Walrand,et al. Distributed Dynamic Programming , 2022 .