Error Reducing Sampling in Reinforcement Learning
暂无分享,去创建一个
[1] Rémi Munos. Efficient Resources Allocation for Markov Decision Processes , 2001, NIPS.
[2] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.
[3] U. Rieder,et al. Markov Decision Processes , 2010 .
[4] Shie Mannor,et al. Action Elimination and Stopping Conditions for Reinforcement Learning , 2003, ICML.
[5] Pravin Varaiya,et al. Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .
[6] Andrew W. Moore,et al. Rates of Convergence for Variable Resolution Schemes in Optimal Control , 2000, ICML.
[7] Michael Kearns,et al. Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.
[8] Shlomo Zilberstein,et al. Planetary Rover Control as a Markov Decision Process , 2002 .
[9] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[10] Patchigolla Kiran Kumar,et al. A Survey of Some Results in Stochastic Adaptive Control , 1985 .
[11] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[12] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[13] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[14] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[15] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[16] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .