Reward Tweaking: Maximizing the Total Reward While Planning for Short Horizons.
暂无分享,去创建一个
[1] Satinder Singh,et al. On Learning Intrinsic Rewards for Policy Gradient Methods , 2018, NeurIPS.
[2] S. Banach. Sur les opérations dans les ensembles abstraits et leur application aux équations intégrales , 1922 .
[3] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[4] Daniel Guo,et al. Agent57: Outperforming the Atari Human Benchmark , 2020, ICML.
[5] Rémi Munos,et al. Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.
[6] David Silver,et al. Meta-Gradient Reinforcement Learning , 2018, NeurIPS.
[7] Robert E. Schapire,et al. A Game-Theoretic Approach to Apprenticeship Learning , 2007, NIPS.
[8] Marek Petrik,et al. Biasing Approximate Dynamic Programming with a Lower Discount Factor , 2008, NIPS.
[9] Ron Meir,et al. Discount Factor as a Regularizer in Reinforcement Learning , 2020, ICML.
[10] Kaifeng Lyu,et al. Gradient Descent Maximizes the Margin of Homogeneous Neural Networks , 2019, ICLR.
[11] J. Schulman,et al. Reptile: a Scalable Metalearning Algorithm , 2018 .
[12] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[13] Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.
[14] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[15] Prabhat Nagarajan,et al. Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations , 2019, ICML.
[16] D. Blackwell. Discrete Dynamic Programming , 1962 .
[17] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.
[18] Sam Devlin,et al. Dynamic potential-based reward shaping , 2012, AAMAS.
[19] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[20] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[21] Shie Mannor,et al. Beyond the One Step Greedy Approach in Reinforcement Learning , 2018, ICML.
[22] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[23] Nan Jiang,et al. The Dependence of Effective Planning Horizon on Model Accuracy , 2015, AAMAS.
[24] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[25] Geoffrey E. Hinton,et al. Training Recurrent Neural Networks , 2013 .
[26] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[27] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[28] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[29] Benjamin Recht,et al. Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.
[30] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[31] Christoph Dann,et al. Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning , 2015, NIPS.
[32] Shie Mannor,et al. A Deep Hierarchical Approach to Lifelong Learning in Minecraft , 2016, AAAI.
[33] Marc G. Bellemare,et al. Increasing the Action Gap: New Operators for Reinforcement Learning , 2015, AAAI.
[34] Amir Massoud Farahmand,et al. Action-Gap Phenomenon in Reinforcement Learning , 2011, NIPS.
[35] Nuttapong Chentanez,et al. Intrinsically Motivated Reinforcement Learning , 2004, NIPS.
[36] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[37] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[38] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).