Guided Reinforcement Learning Under Partial Observability
暂无分享,去创建一个
[1] Edward J. Sondik,et al. The optimal control of par-tially observable Markov processes , 1971 .
[2] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..
[3] Yoshua Bengio,et al. An Input Output HMM Architecture , 1994, NIPS.
[4] Michael L. Littman,et al. Memoryless policies: theoretical limitations and practical results , 1994 .
[5] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[6] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[7] Andrew McCallum,et al. Instance-Based Utile Distinctions for Reinforcement Learning with Hidden State , 1995, ICML.
[8] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[9] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[10] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[11] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[12] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.
[13] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[14] Kee-Eung Kim,et al. Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.
[15] Bram Bakker,et al. Reinforcement Learning with Long Short-Term Memory , 2001, NIPS.
[16] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[17] Joelle Pineau,et al. Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.
[18] Reid G. Simmons,et al. Heuristic Search Value Iteration for POMDPs , 2004, UAI.
[19] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[20] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[21] Jürgen Schmidhuber,et al. Solving Deep Memory POMDPs with Recurrent Policy Gradients , 2007, ICANN.
[22] Jürgen Schmidhuber,et al. Policy Gradient Critics , 2007, ECML.
[23] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[24] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .
[25] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[26] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[27] Sergey Levine,et al. Guided Policy Search , 2013, ICML.
[28] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..
[29] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.
[30] Sergey Levine,et al. Learning Complex Neural Network Policies with Trajectory Optimization , 2014, ICML.
[31] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[32] Ming Yang,et al. DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[33] Sergey Levine,et al. Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.
[34] Nils Jansen,et al. Accelerating Parametric Probabilistic Verification , 2014, QEST.
[35] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
[36] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[37] David Silver,et al. Memory-based control with recurrent neural networks , 2015, ArXiv.
[38] Nolan Wagener,et al. Learning contact-rich manipulation skills with guided policy search , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).
[39] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[40] Philip S. Thomas,et al. Safe Reinforcement Learning , 2015 .
[41] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.
[42] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[43] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[44] Sergey Levine,et al. Learning deep control policies for autonomous aerial vehicles with MPC-guided policy search , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).
[45] Sergey Levine,et al. Learning deep neural network policies with continuous memory states , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).
[46] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[47] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[48] Sergey Levine,et al. Guided Policy Search via Approximate Mirror Descent , 2016, NIPS.
[49] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[50] Anil A. Bharath,et al. Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.
[51] Peter Stone,et al. Reinforcement learning , 2019, Scholarpedia.
[52] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[53] Gerhard Neumann,et al. Guided Deep Reinforcement Learning for Swarm Systems , 2017, ArXiv.
[54] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[55] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[56] Joseph Futoma,et al. Prediction-Constrained POMDPs , 2018 .
[57] Joelle Pineau,et al. Recurrent Value Functions , 2019, ArXiv.
[58] Jan Peters,et al. Compatible natural gradient policy search , 2019, Machine Learning.
[59] Nils Jansen,et al. Counterexample-Guided Strategy Improvement for POMDPs Using Recurrent Neural Networks , 2019, IJCAI.
[60] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.