A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms
暂无分享,去创建一个
Marc G. Bellemare | Doina Precup | Prakash Panangaden | Philip Amortila | Doina Precup | P. Panangaden | P. Amortila
[1] I. Olkin,et al. Multivariate Chebyshev Inequalities , 1960 .
[2] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.
[3] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[4] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[5] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[6] John N. Tsitsiklis,et al. On the Convergence of Optimistic Policy Iteration , 2002, J. Mach. Learn. Res..
[7] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[8] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[9] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[10] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[11] Elizabeth L. Wilmer,et al. Markov Chains and Mixing Times , 2008 .
[12] C. Villani. Optimal Transport: Old and New , 2008 .
[13] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008, Texts and Readings in Mathematics.
[14] Shimon Whiteson,et al. A theoretical and empirical analysis of Expected Sarsa , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[15] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[16] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .
[17] R. Srikant,et al. Error bounds for constant step-size Q-learning , 2012, Syst. Control. Lett..
[18] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[19] Huizhen Yu,et al. Weak Convergence Properties of Constrained Emphatic Temporal-difference Learning with Constant and Slowly Diminishing Stepsize , 2015, J. Mach. Learn. Res..
[20] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[21] F. Bach,et al. Bridging the gap between constant step size stochastic gradient descent and Markov chains , 2017, The Annals of Statistics.
[22] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[23] Csaba Szepesvári,et al. Linear Stochastic Approximation: Constant Step-Size and Iterate Averaging , 2017, ArXiv.
[24] Jalaj Bhandari,et al. A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation , 2018, COLT.
[25] Prakash Panangaden,et al. Free complete Wasserstein algebras , 2018, Log. Methods Comput. Sci..
[26] R. Srikant,et al. Finite-Time Error Bounds For Linear Stochastic Approximation and TD Learning , 2019, COLT.
[27] Marc G. Bellemare,et al. A Comparative Analysis of Expected and Distributional Reinforcement Learning , 2019, AAAI.
[28] Thinh T. Doan,et al. Performance of Q-learning with Linear Function Approximation: Stability and Finite-Time Analysis , 2019 .
[29] John-Paul Clarke,et al. Finite-Time Analysis of Q-Learning with Linear Function Approximation , 2019, ArXiv.