Continuous-Time Hierarchical Reinforcement Learning
暂无分享,去创建一个
[1] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[2] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[3] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[4] Gang Wang,et al. Hierarchical Optimization of Policy-Coupled Semi-Markov Decision Processes , 1999, ICML.
[5] Ronald E. Parr,et al. Hierarchical control and learning for markov decision processes , 1998 .
[6] Michael O. Duff,et al. Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.
[7] SRIDHAR MAHADEVAN,et al. Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results , 2005, Machine Learning.
[8] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[9] Prasad Tadepalli,et al. Auto-Exploratory Average Reward Reinforcement Learning , 1996, AAAI/IAAI, Vol. 1.