论文信息 - Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning - 字舞流文

Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning

We present the Q-Cut algorithm, a graph theoretic approach for automatic detection of sub-goals in a dynamic environment, which is used for acceleration of the Q-Learning algorithm. The learning agent creates an on-line map of the process history, and uses an efficient Max-Flow/Min-Cut algorithm for identifying bottlenecks. The policies for reaching bottlenecks are separately learned and added to the model in a form of options (macro-actions). We then extend the basic Q-Cut algorithm to the Segmented Q-Cut algorithm, which uses previously identified bottlenecks for state space partitioning, necessary for finding additional bottlenecks in complex environments. Experiments showsign ificant performance improvements, particulary in the initial learning phase.

Shie Mannor | Ishai Menache | Nahum Shimkin | Shie Mannor | N. Shimkin | Ishai Menache

[1] Avrim Blum,et al. Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[2] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[3] Jun Morimoto,et al. Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning , 2000, Robotics Auton. Syst..

[4] Bruce L. Digney,et al. Learning hierarchical control structures for multiple tasks and changing environments , 1998 .

[5] Richard S. Sutton,et al. Roles of Macro-Actions in Accelerating Reinforcement Learning , 1998 .

[6] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[7] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.

[8] Jürgen Schmidhuber,et al. HQ-Learning , 1997, Adapt. Behav..

[9] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[10] Andrew V. Goldberg,et al. A new approach to the maximum flow problem , 1986, STOC '86.

[11] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.

[12] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[13] Andrew B. Kahng,et al. When clusters meet partitions: new density-based methods for circuit decomposition , 1995, Proceedings the European Design and Test Conference. ED&TC 1995.

[14] Ravindra K. Ahuja,et al. Network Flows: Theory, Algorithms, and Applications , 1993 .

[15] Chung-Kuan Cheng,et al. Ratio cut partitioning for hierarchical designs , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[16] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[17] Jean-Arcady Meyer,et al. Learning Hierarchical Control Structures for Multiple Tasks and Changing Environments , 1998 .

[18] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[19] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[20] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.