暂无分享,去创建一个
[1] Fang Cao,et al. RVI reinforcement learning for semi-Markov decision processes with average reward , 2010, 2010 8th World Congress on Intelligent Control and Automation.
[2] Junhyuk Oh,et al. Discovery of Options via Meta-Learned Subgoals , 2021, NeurIPS.
[3] Shie Mannor,et al. Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning , 2002, ECML.
[4] Marlos C. Machado,et al. A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.
[5] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[6] Pieter Abbeel,et al. Variational Option Discovery Algorithms , 2018, ArXiv.
[7] Nuttapong Chentanez,et al. Intrinsically Motivated Reinforcement Learning , 2004, NIPS.
[8] Vivek S. Borkar,et al. Learning Algorithms for Markov Decision Processes with Average Cost , 2001, SIAM J. Control. Optim..
[9] V. Borkar. Asynchronous Stochastic Approximations , 1998 .
[10] S. Mahadevan,et al. Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning , 1999 .
[11] P. Schweitzer. Iterative solution of the functional equations of undiscounted Markov renewal programming , 1971 .
[12] Paul J. Schweitzer,et al. The Functional Equations of Undiscounted Markov Renewal Programming , 1971, Math. Oper. Res..
[13] Alessandro Lazaric,et al. Exploration – Exploitation in MDPs with Options , 2016 .
[14] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.
[15] Shalabh Bhatnagar,et al. Universal Option Models , 2014, NIPS.
[16] Shimon Whiteson,et al. Average-Reward Off-Policy Policy Evaluation with Function Approximation , 2021, ICML.
[17] Andrew G. Barto,et al. Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.
[18] Richard S. Sutton,et al. Learning and Planning in Average-Reward Markov Decision Processes , 2020, ICML.
[19] Lihong Li,et al. PAC-inspired Option Discovery in Lifelong Reinforcement Learning , 2014, ICML.
[20] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.
[21] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[22] Daniel Polani,et al. Grounding subgoals in information transitions , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).
[23] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[24] V. Borkar,et al. An analog scheme for fixed point computation. I. Theory , 1997 .
[25] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[26] Abhijit Gosavi,et al. Reinforcement learning for long-run average cost , 2004, Eur. J. Oper. Res..
[27] Vivek S. Borkar,et al. An analog scheme for fixed-point computation-Part II: Applications , 1999 .
[28] TaeChoong Chung,et al. Policy Gradient Semi-markov Decision Process , 2008, 2008 20th IEEE International Conference on Tools with Artificial Intelligence.
[29] Satinder P. Singh,et al. Linear options , 2010, AAMAS.
[30] Daan Wierstra,et al. Variational Intrinsic Control , 2016, ICLR.
[31] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008, Texts and Readings in Mathematics.
[32] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.