暂无分享,去创建一个
[1] Vivek S. Borkar,et al. A sensitivity formula for risk-sensitive cost and the actor-critic algorithm , 2001, Syst. Control. Lett..
[2] A. Yiannakos,et al. Evaluation of the goal scoring patterns in European Championship in Portugal 2004. , 2006 .
[3] Scott Kuindersma,et al. Variable risk control via stochastic optimization , 2013, Int. J. Robotics Res..
[4] Laurent El Ghaoui,et al. Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..
[5] R. Dolan,et al. Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans , 2006, Nature.
[6] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[7] Robert Givan,et al. Bounded Parameter Markov Decision Processes , 1997, ECP.
[8] Shie Mannor,et al. Adaptive Skills Adaptive Partitions (ASAP) , 2016, NIPS.
[9] Shie Mannor,et al. Policy Gradient for Coherent Risk Measures , 2015, NIPS.
[10] George Konidaris,et al. Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.
[11] Bruno Castro da Silva,et al. Learning Parameterized Skills , 2012, ICML.
[12] Hani Hagras,et al. A hierarchical type-2 fuzzy logic control architecture for autonomous mobile robots , 2004, IEEE Transactions on Fuzzy Systems.
[13] P. Dayan,et al. Tonic dopamine: opportunity costs and the control of response vigor , 2007, Psychopharmacology.
[14] Shie Mannor,et al. Time-regularized interrupting options , 2014, ICML 2014.
[15] J. Salamone,et al. Activational and effort-related aspects of motivation: neural mechanisms and implications for psychopathology. , 2016, Brain : a journal of neurology.
[16] V. Borkar. Stochastic approximation with two time scales , 1997 .
[17] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.
[18] Shie Mannor,et al. Optimizing the CVaR via Sampling , 2014, AAAI.
[19] Chelsea C. White,et al. Markov Decision Processes with Imprecise Transition Probabilities , 1994, Oper. Res..
[20] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[21] Alex M. Andrew,et al. Reinforcement Learning: : An Introduction , 1998 .
[22] Shie Mannor,et al. Iterative Hierarchical Optimization for Misspecified Problems (IHOMP) , 2016, ArXiv.
[23] Shie Mannor,et al. Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach , 2015, NIPS.
[24] Wyatt Newman. Team CASE and the 2007 DARPA Urban Challenge , 2007 .
[25] A. Tversky,et al. Prospect theory: an analysis of decision under risk — Source link , 2007 .
[26] Doina Precup,et al. Multi-time Models for Temporally Abstract Planning , 1997, NIPS.
[27] Goldie Nejat,et al. Multirobot Cooperative Learning for Semiautonomous Control in Urban Search and Rescue Applications , 2016, J. Field Robotics.
[28] Xiaoping Chen,et al. Online Planning for Large Markov Decision Processes with Hierarchical Decomposition , 2015, ACM Trans. Intell. Syst. Technol..
[29] E. Fernandez-Gaucherand,et al. Controlled Markov chains with exponential risk-sensitive criteria: modularity, structured policies and applications , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).
[30] Peter Stone,et al. Deep Reinforcement Learning in Parameterized Action Space , 2015, ICLR.
[31] P. Dayan. Instrumental vigour in punishment and reward , 2012, The European journal of neuroscience.
[32] Tomoharu Nakashima,et al. HELIOS Base: An Open Source Package for the RoboCup Soccer 2D Simulation , 2013, RoboCup.
[33] Lihong Li,et al. PAC-inspired Option Discovery in Lifelong Reinforcement Learning , 2014, ICML.
[34] P. Dayan,et al. Safety out of control: dopamine and defence , 2016, Behavioral and Brain Functions.
[35] Shie Mannor,et al. Probabilistic Goal Markov Decision Processes , 2011, IJCAI.
[36] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[37] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[38] A. Dagher,et al. The role of dopamine in risk taking: a specific look at Parkinson’s disease and gambling , 2014, Front. Behav. Neurosci..
[39] E. Oleson,et al. A role for phasic dopamine release within the nucleus accumbens in encoding aversion: a review of the neurochemical literature. , 2015, ACS chemical neuroscience.
[40] Shie Mannor,et al. Learning When to Switch between Skills in a High Dimensional Domain , 2015, AAAI Workshop: Learning for General Competency in Video Games.
[41] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[42] Shie Mannor,et al. Policy Gradients with Variance Related Risk Criteria , 2012, ICML.
[43] Joel Z. Leibo,et al. Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.