Off-policy learning based on weighted importance sampling with linear computational complexity
暂无分享,去创建一个
[1] Christian R. Shelton,et al. Importance sampling for reinforcement learning with multiple objectives , 2001 .
[2] R. Sutton,et al. A new Q ( � ) with interim forward view and Monte Carlo equivalence , 2014 .
[3] Richard S. Sutton,et al. Weighted importance sampling for off-policy learning with linear function approximation , 2014, NIPS.
[4] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[5] Hoon Kim,et al. Monte Carlo Statistical Methods , 2000, Technometrics.
[6] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[7] Doina Precup,et al. A new Q(lambda) with interim forward view and Monte Carlo equivalence , 2014, ICML.
[8] Matthieu Geist,et al. Off-policy learning with eligibility traces: a survey , 2013, J. Mach. Learn. Res..
[9] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[10] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[11] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[12] Christian P. Robert,et al. Monte Carlo Statistical Methods (Springer Texts in Statistics) , 2005 .
[13] Nir Friedman,et al. Probabilistic Graphical Models - Principles and Techniques , 2009 .
[14] Christian P. Robert,et al. Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.
[15] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[16] R. Sutton,et al. GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010 .
[17] Jan Peters,et al. Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..
[18] Rémi Munos,et al. Fast LSTD Using Stochastic Approximation: Finite Time Analysis and Application to Traffic Control , 2013, ECML/PKDD.
[19] Richard S. Sutton,et al. Off-policy TD( l) with a true online equivalence , 2014, UAI.
[20] Thore Graepel,et al. A Comparison of learning algorithms on the Arcade Learning Environment , 2014, ArXiv.
[21] Richard S. Sutton,et al. GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010, Artificial General Intelligence.
[22] Tim Hesterberg,et al. Monte Carlo Strategies in Scientific Computing , 2002, Technometrics.
[23] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[24] Richard S. Sutton,et al. True online TD(λ) , 2014, ICML 2014.
[25] H. Kahn,et al. Methods of Reducing Sample Size in Monte Carlo Computations , 1953, Oper. Res..
[26] Tom Schaul,et al. No more pesky learning rates , 2012, ICML.
[27] Patrick M. Pilarski,et al. Tuning-free step-size adaptation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[28] Reuven Y. Rubinstein,et al. Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.