暂无分享,去创建一个
[1] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[2] Bruno Castro da Silva,et al. Learning Parameterized Skills , 2012, ICML.
[3] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[4] Pravesh Ranchod,et al. Reinforcement Learning with Parameterized Actions , 2015, AAAI.
[5] Xiaoping Chen,et al. Online Planning for Large Markov Decision Processes with Hierarchical Decomposition , 2015, ACM Trans. Intell. Syst. Technol..
[6] Shie Mannor,et al. Probabilistic Goal Markov Decision Processes , 2011, IJCAI.
[7] Hani Hagras,et al. A hierarchical type-2 fuzzy logic control architecture for autonomous mobile robots , 2004, IEEE Transactions on Fuzzy Systems.
[8] Shie Mannor,et al. Learning When to Switch between Skills in a High Dimensional Domain , 2015, AAAI Workshop: Learning for General Competency in Video Games.
[9] Tomoharu Nakashima,et al. HELIOS Base: An Open Source Package for the RoboCup Soccer 2D Simulation , 2013, RoboCup.
[10] Lihong Li,et al. PAC-inspired Option Discovery in Lifelong Reinforcement Learning , 2014, ICML.
[11] Shie Mannor,et al. Time-regularized interrupting options , 2014, ICML 2014.
[12] Peter Stone,et al. Deep Reinforcement Learning in Parameterized Action Space , 2015, ICLR.
[13] Shie Mannor,et al. Adaptive Skills Adaptive Partitions (ASAP) , 2016, NIPS.
[14] A. Yiannakos,et al. Evaluation of the goal scoring patterns in European Championship in Portugal 2004. , 2006 .
[15] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[16] Mica R. Endsley,et al. Toward a Theory of Situation Awareness in Dynamic Systems , 1995, Hum. Factors.
[17] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[18] Shie Mannor,et al. Policy Gradient for Coherent Risk Measures , 2015, NIPS.
[19] V. Borkar. Stochastic approximation with two time scales , 1997 .
[20] Alex M. Andrew,et al. Reinforcement Learning: : An Introduction , 1998 .
[21] Goldie Nejat,et al. Multirobot Cooperative Learning for Semiautonomous Control in Urban Search and Rescue Applications , 2016, J. Field Robotics.
[22] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[23] Andrew G. Barto,et al. PolicyBlocks: An Algorithm for Creating Useful Macro-Actions in Reinforcement Learning , 2002, ICML.
[24] Doina Precup,et al. Multi-time Models for Temporally Abstract Planning , 1997, NIPS.
[25] E. Fernandez-Gaucherand,et al. Controlled Markov chains with exponential risk-sensitive criteria: modularity, structured policies and applications , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).
[26] Shie Mannor,et al. Iterative Hierarchical Optimization for Misspecified Problems (IHOMP) , 2016, ArXiv.
[27] Sebastian Thrun,et al. Lifelong robot learning , 1993, Robotics Auton. Syst..
[28] Kip Smith,et al. Situation Awareness Is Adaptive, Externally Directed Consciousness , 1995, Hum. Factors.
[29] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.
[30] Shie Mannor,et al. Optimizing the CVaR via Sampling , 2014, AAAI.