论文信息 - Continuous-Time Hierarchical Reinforcement Learning

Continuous-Time Hierarchical Reinforcement Learning

Hierarchical reinforcement learning (RL) is a general framework which studies how to exploit the structure of actions and tasks to accelerate policy learning in large domains. Prior work in hierarchical RL, such as the MAXQ method, has been limited to the discrete-time discounted reward semiMarkov decision process (SMDP) model. This paper generalizes the MAXQ method to continuous-time discounted and average reward SMDP models. We describe two hierarchical reinforcement learning algorithms: continuous-time discounted reward MAXQ and continuous-time average reward MAXQ. We apply these algorithms to a complex multiagent AGV scheduling problem, and compare their performance and speed with each other, as well as several well-known AGV scheduling heuristics.

Sridhar Mahadevan | Mohammad Ghavamzadeh | M. Ghavamzadeh | S. Mahadevan

[1] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.

[2] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[3] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[4] Gang Wang,et al. Hierarchical Optimization of Policy-Coupled Semi-Markov Decision Processes , 1999, ICML.

[5] Ronald E. Parr,et al. Hierarchical control and learning for markov decision processes , 1998 .

[6] Michael O. Duff,et al. Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.

[7] SRIDHAR MAHADEVAN,et al. Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results , 2005, Machine Learning.

[8] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[9] Prasad Tadepalli,et al. Auto-Exploratory Average Reward Reinforcement Learning , 1996, AAAI/IAAI, Vol. 1.