暂无分享,去创建一个
Martin J. Wainwright | Michael I. Jordan | Koulik Khamaru | Eric Xia | M. Wainwright | K. Khamaru | Eric Xia
[1] Martin J. Wainwright,et al. High-Dimensional Statistics , 2019 .
[2] T. Cai,et al. A Framework For Estimation of Convex Functions , 2015 .
[3] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[4] Emma Brunskill,et al. Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds , 2019, ICML.
[5] Martin J. Wainwright,et al. Variance-reduced Q-learning is minimax optimal , 2019, ArXiv.
[6] Jalaj Bhandari,et al. A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation , 2018, COLT.
[7] Shie Mannor,et al. How hard is my MDP?" The distribution-norm to the rescue" , 2014, NIPS.
[8] Martin J. Wainwright,et al. Instance-Dependent ℓ∞-Bounds for Policy Evaluation in Tabular Reinforcement Learning , 2021, IEEE Transactions on Information Theory.
[9] Martin J. Wainwright,et al. Stochastic approximation with cone-contractive operators: Sharp 𝓁∞-bounds for Q-learning , 2019, ArXiv.
[10] Mykel J. Kochenderfer,et al. Almost Horizon-Free Structure-Aware Best Policy Identification with a Generative Model , 2019, NeurIPS.
[11] Xian Wu,et al. Variance reduced value iteration and faster algorithms for solving Markov decision processes , 2017, SODA.
[12] Martin J. Wainwright,et al. Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis , 2020, SIAM J. Math. Data Sci..
[13] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[14] Martin L. Puterman,et al. On the Convergence of Policy Iteration in Stationary Dynamic Programming , 1979, Math. Oper. Res..
[15] Csaba Szepesvári,et al. Linear Stochastic Approximation: How Far Does Constant Step-Size and Iterate Averaging Go? , 2018, AISTATS.
[16] Csaba Szepesvári,et al. The Asymptotic Convergence-Rate of Q-learning , 1997, NIPS.
[17] Wojciech Zaremba,et al. Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[18] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[19] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[20] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[21] E. M. Hartwell. Boston , 1906 .
[22] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[23] Max Simchowitz,et al. Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs , 2019, NeurIPS.
[24] Xian Wu,et al. Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model , 2018, NeurIPS.
[25] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[26] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[27] Lucien Birgé. Approximation dans les espaces métriques et théorie de l'estimation , 1983 .
[28] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[29] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[30] Shie Mannor,et al. Finite Sample Analyses for TD(0) With Function Approximation , 2017, AAAI.