暂无分享,去创建一个
[1] Luca Bascetta,et al. Adaptive Step-Size for Policy Gradient Methods , 2013, NIPS.
[2] Angelia Nedic,et al. On Stochastic Subgradient Mirror-Descent Algorithm with Weighted Averaging , 2013, SIAM J. Optim..
[3] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[4] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[5] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[6] A. Juditsky,et al. 5 First-Order Methods for Nonsmooth Convex Large-Scale Optimization , I : General Purpose Methods , 2010 .
[7] Matthieu Geist,et al. Local Policy Search in a Convex Space and Conservative Policy Iteration as Boosted Policy Search , 2014, ECML/PKDD.
[8] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[9] Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.
[10] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[11] Nicolas Le Roux,et al. Understanding the impact of entropy on policy optimization , 2018, ICML.
[12] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[13] Marcello Restelli,et al. Smoothing policies and safe policy gradients , 2019, Machine Learning.
[14] Bruno Scherrer,et al. Approximate Policy Iteration Schemes: A Comparison , 2014, ICML.
[15] Jalaj Bhandari,et al. Global Optimality Guarantees For Policy Gradient Methods , 2019, ArXiv.
[16] Qi Cai,et al. Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy , 2019, ArXiv.
[17] Dale Schuurmans,et al. Trust-PCL: An Off-Policy Trust Region Method for Continuous Control , 2017, ICLR.
[18] Matthieu Geist,et al. A Theory of Regularized Markov Decision Processes , 2019, ICML.
[19] Ofir Nachum,et al. Path Consistency Learning in Tsallis Entropy Regularized MDPs , 2018, ICML.
[21] Le Song,et al. SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation , 2017, ICML.
[22] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[23] Sham M. Kakade,et al. On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift , 2019, J. Mach. Learn. Res..
[24] Vicenç Gómez,et al. A unified view of entropy-regularized Markov decision processes , 2017, ArXiv.
[25] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[26] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.
[27] Marc Teboulle,et al. Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..