暂无分享,去创建一个
Shane Legg | Jordi Grau-Moya | Rob Brekelmans | Pedro A. Ortega | Tim Genewein | Markus Kunesch | Gr'egoire Del'etang | S. Legg | Tim Genewein | Rob Brekelmans | Gr'egoire Del'etang | M. Kunesch | Jordi Grau-Moya
[1] Daniel A. Braun,et al. Thermodynamics as a theory of decision-making with information-processing costs , 2012, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.
[2] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[3] Stefan Schaal,et al. A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..
[4] Emanuel Todorov,et al. Linearly-solvable Markov decision problems , 2006, NIPS.
[5] Laurent El Ghaoui,et al. Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..
[6] Ryota Tomioka,et al. Regularized Policies are Reward Robust , 2021, AISTATS.
[7] Vicenç Gómez,et al. Optimal control as a graphical model inference problem , 2009, Machine Learning.
[8] Samuel J Gershman,et al. Do learning rates adapt to the distribution of rewards? , 2015, Psychonomic bulletin & review.
[9] Daniel Polani,et al. Information Theory of Decisions and Actions , 2011 .
[10] Yoshua Bengio,et al. Série Scientifique Scientific Series Incorporating Second-order Functional Knowledge for Better Option Pricing Incorporating Second-order Functional Knowledge for Better Option Pricing , 2022 .
[11] John N. Tsitsiklis,et al. Neuro-dynamic programming: an overview , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.
[12] P. Schrimpf,et al. Dynamic Programming , 2011 .
[13] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[14] Stuart J. Russell,et al. Research Priorities for Robust and Beneficial Artificial Intelligence , 2015, AI Mag..
[15] G. Hunanyan,et al. Portfolio Selection , 2019, Finanzwirtschaft, Banken und Bankmanagement I Finance, Banks and Bank Management.
[16] P. Dayan,et al. Neural Prediction Errors Reveal a Risk-Sensitive Reinforcement-Learning Process in the Human Brain , 2012, The Journal of Neuroscience.
[17] Daniel D. Lee,et al. An Adversarial Interpretation of Information-Theoretic Bounded Rationality , 2014, AAAI.
[18] Rémi Munos,et al. Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.
[19] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[20] R. Rescorla,et al. A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .
[21] Shie Mannor,et al. Scaling Up Robust MDPs using Function Approximation , 2014, ICML.
[22] Klaus Obermayer,et al. Risk-Sensitive Reinforcement Learning , 2013, Neural Computation.
[23] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[24] Sergey Levine,et al. If MaxEnt RL is the Answer, What is the Question? , 2019, ArXiv.
[25] Daniel A. Braun,et al. Information, Utility and Bounded Rationality , 2011, AGI.
[26] Jordi Grau-Moya,et al. Planning with Information-Processing Constraints and Model Uncertainty in Markov Decision Processes , 2016, ECML/PKDD.
[27] Richard S. Sutton,et al. Time-Derivative Models of Pavlovian Reinforcement , 1990 .
[28] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[29] H. Kappen. Path integrals and symmetry breaking for optimal control theory , 2005, physics/0505066.
[30] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..
[31] Michèle Sebag,et al. Exploration vs Exploitation vs Safety: Risk-Aware Multi-Armed Bandits , 2013, ACML.
[32] Shie Mannor,et al. A General Approach to Multi-Armed Bandits Under Risk Criteria , 2018, COLT.
[33] Marc Toussaint,et al. Robot trajectory optimization using approximate inference , 2009, ICML '09.