暂无分享,去创建一个
Martin J. Wainwright | Michael I. Jordan | Feng Ruan | Koulik Khamaru | Ashwin Pananjady | M. Wainwright | A. Pananjady | K. Khamaru | Feng Ruan
[1] S. Kakade,et al. Reinforcement Learning: Theory and Algorithms , 2019 .
[2] Alexandre B. Tsybakov,et al. Introduction to Nonparametric Estimation , 2008, Springer series in statistics.
[3] H. Robbins. A Stochastic Approximation Method , 1951 .
[4] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[5] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[6] Martin J. Wainwright,et al. Stochastic approximation with cone-contractive operators: Sharp 𝓁∞-bounds for Q-learning , 2019, ArXiv.
[7] Jan Peters,et al. Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..
[8] Xian Wu,et al. Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model , 2018, NeurIPS.
[9] Shie Mannor,et al. Finite Sample Analyses for TD(0) With Function Approximation , 2017, AAAI.
[10] Martin J. Wainwright,et al. Variance-reduced Q-learning is minimax optimal , 2019, ArXiv.
[11] Lucien Birg. Approximation dans les espaces m?triques et th?orie de l'estimation , 1983 .
[12] D. Ruppert,et al. Efficient Estimations from a Slowly Convergent Robbins-Monro Process , 1988 .
[13] Thinh T. Doan,et al. Finite-Time Performance of Distributed Temporal Difference Learning with Linear Function Approximation , 2019, SIAM J. Math. Data Sci..
[14] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[15] Emma Brunskill,et al. Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds , 2019, ICML.
[16] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[17] Frank E. Grubbs,et al. An Introduction to Probability Theory and Its Applications , 1951 .
[18] V. B. Tadic,et al. On the almost sure rate of convergence of linear stochastic approximation algorithms , 2004, IEEE Transactions on Information Theory.
[19] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008, Texts and Readings in Mathematics.
[20] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[21] Martin J. Wainwright,et al. Stochastic approximation with cone-contractive operators: Sharp $\ell_\infty$-bounds for $Q$-learning , 2019, 1905.06265.
[22] L. L. Cam,et al. Limits of experiments , 1972 .
[23] Csaba Szepesvári,et al. Linear Stochastic Approximation: How Far Does Constant Step-Size and Iterate Averaging Go? , 2018, AISTATS.
[24] Martin J. Wainwright,et al. Instance-Dependent ℓ∞-Bounds for Policy Evaluation in Tabular Reinforcement Learning , 2021, IEEE Transactions on Information Theory.
[25] Jalaj Bhandari,et al. A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation , 2018, COLT.
[26] Lucien Birgé. Approximation dans les espaces métriques et théorie de l'estimation , 1983 .
[27] T. N. Sriram. Asymptotics in Statistics–Some Basic Concepts , 2002 .
[28] C. Stein. Efficient Nonparametric Testing and Estimation , 1956 .
[29] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[30] J. Hájek. Local asymptotic minimax and admissibility in estimation , 1972 .
[31] Martin J. Wainwright,et al. Value function estimation in Markov reward processes: Instance-dependent 𝓁∞-bounds for policy evaluation , 2019, ArXiv.
[32] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[33] Shie Mannor,et al. How hard is my MDP?" The distribution-norm to the rescue" , 2014, NIPS.
[34] Martin J. Wainwright,et al. High-Dimensional Statistics , 2019 .
[35] Eric Moulines,et al. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.
[36] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[37] Zhaoran Wang,et al. Variance Reduced Policy Evaluation with Smooth Function Approximation , 2019, NeurIPS.
[38] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[39] R. Durrett. Essentials of Stochastic Processes , 1999 .
[40] Nan Jiang,et al. Open Problem: The Dependence of Sample Complexity Lower Bounds on Planning Horizon , 2018, COLT.
[41] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[42] T. Cai,et al. An adaptation theory for nonparametric confidence intervals , 2004, math/0503662.
[43] D. V. Lindley,et al. An Introduction to Probability Theory and Its Applications. Volume II , 1967, The Mathematical Gazette.
[44] Dimitri P. Bertsekas,et al. Dynamic Programming and Stochastic Control , 1977, IEEE Transactions on Systems, Man, and Cybernetics.
[45] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[46] D. Donoho,et al. Geometrizing Rates of Convergence, III , 1991 .
[47] Tor Lattimore,et al. Near-optimal PAC bounds for discounted MDPs , 2014, Theor. Comput. Sci..
[48] Nathaniel Korda,et al. On TD(0) with function approximation: Concentration bounds and a centered variant with exponential convergence , 2014, ICML.
[49] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[50] Michael Kearns,et al. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.
[51] Yingbin Liang,et al. Reanalysis of Variance Reduced Temporal Difference Learning , 2020, ICLR.
[52] D. Donoho,et al. Geometrizing Rates of Convergence , II , 2008 .