暂无分享,去创建一个
[1] Sergey Levine,et al. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.
[2] Leonid Peshkin,et al. Learning from Scarce Experience , 2002, ICML.
[3] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[4] Reuven Y. Rubinstein,et al. Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.
[5] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.
[6] Alberto Bemporad,et al. Predictive Control for Linear and Hybrid Systems , 2017 .
[7] Martha White,et al. An Off-policy Policy Gradient Theorem Using Emphatic Weightings , 2018, NeurIPS.
[8] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[9] Ilya Kostrikov,et al. AlgaeDICE: Policy Gradient from Arbitrary Experience , 2019, ArXiv.
[10] Emma Brunskill,et al. Off-Policy Policy Gradient with State Distribution Correction , 2019, UAI 2019.
[11] Gavin Taylor,et al. Kernelized value function approximation for reinforcement learning , 2009, ICML '09.
[12] Liming Xiang,et al. Kernel-Based Reinforcement Learning , 2006, ICIC.
[13] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..
[14] Jan Peters,et al. A Nonparametric Off-Policy Policy Gradient , 2020, AISTATS.
[15] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[16] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[17] Stergios B. Fotopoulos,et al. All of Nonparametric Statistics , 2007, Technometrics.
[18] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[19] Philip Thomas,et al. Bias in Natural Actor-Critic Algorithms , 2014, ICML.
[20] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[21] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[22] Marcello Restelli,et al. Policy Optimization via Importance Sampling , 2018, NeurIPS.
[23] E. Nadaraya. On Estimating Regression , 1964 .
[24] Oliver Kroemer,et al. A Non-Parametric Approach to Dynamic Programming , 2011, NIPS.
[25] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[26] Jan Peters,et al. An Upper Bound of the Bias of Nadaraya-Watson Kernel Regression under Lipschitz Assumptions , 2020, Stats.
[27] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[28] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[29] G. S. Watson,et al. Smooth regression analysis , 1964 .
[30] Andrea Bonarini,et al. MushroomRL: Simplifying Reinforcement Learning Research , 2020, J. Mach. Learn. Res..
[31] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[32] Pieter Abbeel,et al. On a Connection between Importance Sampling and the Likelihood Ratio Policy Gradient , 2010, NIPS.
[33] Craig Boutilier,et al. Non-delusional Q-learning and value-iteration , 2018, NeurIPS.
[34] Nicolas Meuleau,et al. Exploration in Gradient-Based Reinforcement Learning , 2001 .
[35] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[36] Xin Xu,et al. Kernel-Based Least Squares Policy Iteration for Reinforcement Learning , 2007, IEEE Transactions on Neural Networks.
[37] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[38] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[39] Christian R. Shelton,et al. Policy Improvement for POMDPs Using Normalized Importance Sampling , 2001, UAI.
[40] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[41] Jianqing Fan. Design-adaptive Nonparametric Regression , 1992 .
[42] Jan Peters,et al. Policy Search for Motor Primitives , 2009, Künstliche Intell..
[43] Philip S. Thomas,et al. Is the Policy Gradient a Gradient? , 2019, AAMAS.
[44] Jan Peters,et al. Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .
[45] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.
[46] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[47] Sergey Levine,et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.
[48] Martha White,et al. Unifying Task Specification in Reinforcement Learning , 2016, ICML.
[49] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[50] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.