Estimation and Approximation Bounds for Gradient-Based Reinforcement Learning
暂无分享,去创建一个
[1] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .
[2] Peter W. Glynn,et al. Stochastic approximation for Monte Carlo optimization , 1986, WSC '86.
[3] Alan Weiss,et al. Sensitivity analysis via likelihood ratios , 1986, WSC '86.
[4] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[5] Dharmendra S. Modha,et al. Minimum complexity regression estimation with weakly dependent observations , 1996, IEEE Trans. Inf. Theory.
[6] Shigenobu Kobayashi,et al. Reinforcement Learning in POMDPs with Function Approximation , 1997, ICML.
[7] Xi-Ren Cao,et al. Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization , 1998, IEEE Trans. Control. Syst. Technol..
[8] P. Marbach. Simulation-Based Methods for Markov Decision Processes , 1998 .
[9] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.
[10] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).
[11] P. Kumar,et al. Learning dynamical systems in a stationary environment , 1998 .
[12] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[13] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .
[14] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[15] P. Bartlett,et al. Direct Gradient-Based Reinforcement Learning: I. Gradient Estimation Algorithms , 1999 .
[16] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[17] Ron Meir,et al. Nonparametric Time Series Prediction Through Adaptive Model Selection , 2000, Machine Learning.