Fitted Q-iteration in continuous action-space MDPs
暂无分享,去创建一个
[1] A. Kolmogorov,et al. Entropy and "-capacity of sets in func-tional spaces , 1961 .
[2] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[3] Bin Yu. RATES OF CONVERGENCE FOR EMPIRICAL PROCESSES OF STATIONARY MIXING SEQUENCES , 1994 .
[4] Philip M. Long,et al. Fat-shattering and the learnability of real-valued functions , 1994, COLT '94.
[5] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[6] David Haussler,et al. Sphere Packing Numbers for Subsets of the Boolean n-Cube with Bounded Vapnik-Chervonenkis Dimension , 1995, J. Comb. Theory, Ser. A.
[7] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[8] Peter L. Bartlett,et al. Learning in Neural Networks: Theoretical Foundations , 1999 .
[9] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .
[10] Nello Cristianini,et al. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .
[11] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[12] Leonid Peshkin,et al. Learning from Scarce Experience , 2002, ICML.
[13] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[14] Ron Meir,et al. Nonparametric Time Series Prediction Through Adaptive Model Selection , 2000, Machine Learning.
[15] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[16] Csaba Szepesvári,et al. Finite time bounds for sampling based fitted value iteration , 2005, ICML.
[17] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[18] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[19] Douglas Aberdeen,et al. Policy-Gradient Methods for Planning , 2005, NIPS.
[20] Liming Xiang,et al. Kernel-Based Reinforcement Learning , 2006, ICIC.
[21] Ambuj Tewari,et al. Sample Complexity of Policy Search with Known Dynamics , 2006, NIPS.
[22] Csaba Szepesvári,et al. Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path , 2006, COLT.
[23] A. Antos,et al. Value-Iteration Based Fitted Policy Iteration: Learning with a Single Trajectory , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[24] Peter Stone,et al. Batch reinforcement learning in a complex domain , 2007, AAMAS '07.
[25] Dimitri P. Bertsekas,et al. Stochastic optimal control : the discrete time case , 2007 .
[26] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.