暂无分享,去创建一个
[1] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..
[2] Laurent Orseau,et al. AI Safety Gridworlds , 2017, ArXiv.
[3] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[4] Mohammad Ghavamzadeh,et al. Variance-constrained actor-critic algorithms for discounted and average reward MDPs , 2014, Machine Learning.
[5] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[6] Leandros Tassiulas,et al. Control and optimization meet the smart power grid: scheduling of power demands for optimal energy management , 2010, e-Energy.
[7] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[8] Prashanth L.A,et al. Policy Gradients for CVaR-Constrained MDPs , 2014, 1405.2690.
[9] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[10] Jing Peng,et al. Function Optimization using Connectionist Reinforcement Learning Algorithms , 1991 .
[11] P. Krokhmal,et al. Portfolio optimization with conditional value-at-risk objective and constraints , 2001 .
[12] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.
[13] Shalabh Bhatnagar,et al. An Online Actor–Critic Algorithm with Function Approximation for Constrained Markov Decision Processes , 2012, J. Optim. Theory Appl..
[14] Shie Mannor,et al. Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach , 2015, NIPS.
[15] Shie Mannor,et al. Policy Gradients with Variance Related Risk Criteria , 2012, ICML.
[16] Michael I. Jordan,et al. First-order methods almost always avoid saddle points: The case of vanishing step-sizes , 2019, NeurIPS.
[17] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[18] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[19] M. T. Wasan. Stochastic Approximation , 1969 .
[20] Shie Mannor,et al. A Geometric Approach to Multi-Criterion Reinforcement Learning , 2004, J. Mach. Learn. Res..
[21] E. Altman. Constrained Markov Decision Processes , 1999 .
[22] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[23] Sergey Levine,et al. Guided Policy Search , 2013, ICML.
[24] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[25] Ann Nowé,et al. Multi-objective reinforcement learning using sets of pareto dominating policies , 2014, J. Mach. Learn. Res..
[26] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[27] Benjamin Recht,et al. Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.
[28] Sergey Levine,et al. DeepMimic , 2018, ACM Trans. Graph..
[29] Sergey Levine,et al. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[30] Qianchuan Zhao,et al. Optimization of Web Service-Based Control System for Balance Between Network Traffic and Delay , 2018, IEEE Transactions on Automation Science and Engineering.
[31] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[32] Honglak Lee,et al. Efficient L1 Regularized Logistic Regression , 2006, AAAI.
[33] Mohammad Ghavamzadeh,et al. Algorithms for CVaR Optimization in MDPs , 2014, NIPS.
[34] Vivek S. Borkar,et al. An actor-critic algorithm for constrained Markov decision processes , 2005, Syst. Control. Lett..
[35] A. PrashanthL.. Policy Gradients for CVaR-Constrained MDPs , 2014, ALT.
[36] Yuval Tassa,et al. Safe Exploration in Continuous Action Spaces , 2018, ArXiv.
[37] Anders Krogh,et al. A Simple Weight Decay Can Improve Generalization , 1991, NIPS.
[38] Shie Mannor,et al. Variance Adjusted Actor Critic Algorithms , 2013, ArXiv.