暂无分享,去创建一个
[1] Pieter Abbeel,et al. Safe Exploration in Markov Decision Processes , 2012, ICML.
[2] R. Rockafellar,et al. Optimization of conditional value-at risk , 2000 .
[3] Fernando Paganini,et al. IEEE Transactions on Automatic Control , 2006 .
[4] Alexander Shapiro,et al. Lectures on Stochastic Programming: Modeling and Theory , 2009 .
[5] M. D. Wilkinson,et al. Management science , 1989, British Dental Journal.
[6] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[7] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[8] M. J. Sobel. The variance of discounted Markov decision processes , 1982 .
[9] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .
[10] E. Altman. Constrained Markov Decision Processes , 1999 .
[11] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[12] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[13] David M. W. Powers,et al. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.
[14] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.
[15] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[16] Joelle Pineau,et al. Proceedings of the Twenty-Ninth International Conference on Machine Learning , 2012 .
[17] Daniel Gooch,et al. Communications of the ACM , 2011, XRDS.
[18] D. Krass,et al. Percentile performance criteria for limiting average Markov decision processes , 1995, IEEE Trans. Autom. Control..
[19] R. Rosenfeld. Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.
[20] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[21] Sven Koenig,et al. Functional Value Iteration for Decision-Theoretic Planning with General Utility Functions , 2006, AAAI.
[22] Marko Bacic,et al. Model predictive control , 2003 .
[23] Marco Pavone,et al. Risk-Constrained Reinforcement Learning with Percentile Risk Criteria , 2015, J. Mach. Learn. Res..
[24] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.
[25] John Schulman,et al. Concrete Problems in AI Safety , 2016, ArXiv.
[26] David Q. Mayne,et al. Model predictive control: Recent developments and future promise , 2014, Autom..
[27] Shie Mannor,et al. Optimizing the CVaR via Sampling , 2014, AAAI.
[28] Kathleen Daly. Volume 7 , 1998 .
[29] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..
[30] Makoto Sato,et al. TD algorithm for the variance of return and mean-variance reinforcement learning , 2001 .
[31] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[32] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[33] Geoff Buckwell,et al. Number (AT 2) , 1993 .
[34] Masashi Sugiyama,et al. Parametric Return Density Estimation for Reinforcement Learning , 2010, UAI.
[35] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[36] Sven Koenig,et al. Existence and Finiteness Conditions for Risk-Sensitive Planning: Results and Conjectures , 2005, UAI.
[37] Peter W. Glynn,et al. Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.
[38] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[39] John N. Tsitsiklis,et al. Mean-Variance Optimization in Markov Decision Processes , 2011, ICML.
[40] Ralph Neuneier,et al. Risk-Sensitive Reinforcement Learning , 1998, Machine Learning.
[41] R. Howard,et al. Risk-Sensitive Markov Decision Processes , 1972 .
[42] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[43] Fritz Wysotzki,et al. Risk-Sensitive Reinforcement Learning Applied to Control under Constraints , 2005, J. Artif. Intell. Res..
[44] Shie Mannor,et al. Policy Gradient for Coherent Risk Measures , 2015, NIPS.
[45] S. Crawford,et al. Volume 1 , 2012, Journal of Diabetes Investigation.
[46] Eric Eaton,et al. Safe Policy Search for Lifelong Reinforcement Learning with Sublinear Regret , 2015, ICML.