Error Bounds for Approximate Policy Iteration
暂无分享,去创建一个
[1] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[2] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[3] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[4] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[5] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[6] K. Judd. Numerical methods in economics , 1998 .
[7] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[8] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[9] Daphne Koller,et al. Policy Iteration for Factored MDPs , 2000, UAI.
[10] Carlos Guestrin,et al. Max-norm Projections for Factored MDPs , 2001, IJCAI.
[11] Michail G. Lagoudakis,et al. Model-Free Least-Squares Policy Iteration , 2001, NIPS.
[12] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[13] Ralf Schoknecht,et al. Optimality of Reinforcement Learning Algorithms with Linear Function Approximation , 2002, NIPS.
[14] Doina Precup,et al. A Convergent Form of Approximate Policy Iteration , 2002, NIPS.
[15] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[16] Andrew W. Moore,et al. Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.
[17] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.