Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning

We present the Q-Cut algorithm, a graph theoretic approach for automatic detection of sub-goals in a dynamic environment, which is used for acceleration of the Q-Learning algorithm. The learning agent creates an on-line map of the process history, and uses an efficient Max-Flow/Min-Cut algorithm for identifying bottlenecks. The policies for reaching bottlenecks are separately learned and added to the model in a form of options (macro-actions). We then extend the basic Q-Cut algorithm to the Segmented Q-Cut algorithm, which uses previously identified bottlenecks for state space partitioning, necessary for finding additional bottlenecks in complex environments. Experiments showsign ificant performance improvements, particulary in the initial learning phase.

[1]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[2]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[3]  Jun Morimoto,et al.  Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning , 2000, Robotics Auton. Syst..

[4]  Bruce L. Digney,et al.  Learning hierarchical control structures for multiple tasks and changing environments , 1998 .

[5]  Richard S. Sutton,et al.  Roles of Macro-Actions in Accelerating Reinforcement Learning , 1998 .

[6]  Andrew G. Barto,et al.  Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[7]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[8]  Jürgen Schmidhuber,et al.  HQ-Learning , 1997, Adapt. Behav..

[9]  John N. Tsitsiklis,et al.  Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[10]  Andrew V. Goldberg,et al.  A new approach to the maximum flow problem , 1986, STOC '86.

[11]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[12]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[13]  Andrew B. Kahng,et al.  When clusters meet partitions: new density-based methods for circuit decomposition , 1995, Proceedings the European Design and Test Conference. ED&TC 1995.

[14]  Ravindra K. Ahuja,et al.  Network Flows: Theory, Algorithms, and Applications , 1993 .

[15]  Chung-Kuan Cheng,et al.  Ratio cut partitioning for hierarchical designs , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[16]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[17]  Jean-Arcady Meyer,et al.  Learning Hierarchical Control Structures for Multiple Tasks and Changing Environments , 1998 .

[18]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[19]  Michael I. Jordan,et al.  Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[20]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.