Behavioral Hierarchy: Exploration and Representation

Behavioral modules are units of behavior providing reusable building blocks that can be composed sequentially and hierarchically to generate extensive ranges of behavior. Hierarchies of behavioral modules facilitate learning complex skills and planning at multiple levels of abstraction and enable agents to incrementally improve their competence for facing new challenges that arise over extended periods of time. This chapter focuses on two features of behavioral hierarchy that appear to be less well recognized: its influence on exploratory behavior and the opportunity it affords to reduce the representational challenges of planning and learning in large, complex domains. Four computational examples are described that use methods of hierarchical reinforcement learning to illustrate the influence of behavioral hierarchy on exploration and representation. Beyond illustrating these features, the examples provide support for the central role of behavioral hierarchy in development and learning for both artificial and natural agents.

[1]  L. A. Jeffress Cerebral mechanisms in behavior : the Hixon symposium , 1951 .

[2]  K. Lashley The problem of serial order in behavior , 1951 .

[3]  R. W. White Motivation reconsidered: the concept of competence. , 1959, Psychological review.

[4]  G. Miller,et al.  Plans and the structure of behavior , 1960 .

[5]  E. Feigenbaum,et al.  Computers and Thought , 1963 .

[6]  A. Battersby Plans and the Structure of Behavior , 1968 .

[7]  Herbert A. Simon,et al.  The Sciences of the Artificial , 1970 .

[8]  Richard Fikes,et al.  Learning and Executing Generalized Robot Plans , 1993, Artif. Intell..

[9]  Earl D. Sacerdoti,et al.  Planning in a Hierarchy of Abstraction Spaces , 1974, IJCAI.

[10]  R. Shaw,et al.  Perceiving, Acting and Knowing : Toward an Ecological Psychology , 1978 .

[11]  Frederick Hayes-Roth,et al.  AN OVERVIEW OF PATTERN-DIRECTED INFERENCE SYSTEMS , 1978 .

[12]  Donald A. Waterman,et al.  Pattern-Directed Inference Systems , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Herbert A. Simon,et al.  The Sciences of the Artificial - 3rd Edition , 1981 .

[14]  R. Korf Learning to solve problems by searching for macro-operators , 1983 .

[15]  Allen Newell,et al.  GPS, a program that simulates human thought , 1995 .

[16]  C. Harston,et al.  Issues in the development of expert systems , 1988, IEA/AIE '88.

[17]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[18]  Jean-Arcady Meyer,et al.  From Animals to Animats: Proceedings of The First International Conference on Simulation of Adaptive Behavior (Complex Adaptive Systems) , 1990 .

[19]  Wray L. Buntine Theory Refinement on Bayesian Networks , 1991, UAI.

[20]  J. Urgen Schmidhuber,et al.  Adaptive confidence and adaptive curiosity , 1991, Forschungsberichte, TU Munich.

[21]  Jürgen Schmidhuber,et al.  A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[22]  Panos J. Antsaklis,et al.  An introduction to intelligent and autonomous control , 1993 .

[23]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[24]  Sebastian Thrun,et al.  Finding Structure in Reinforcement Learning , 1994, NIPS.

[25]  David Heckerman,et al.  Learning Bayesian Networks: Search Methods and Experimental Results , 1995 .

[26]  Pattie Maes,et al.  Emergent Hierarchical Control Structures: Learning Reactive/Hierarchical Relationships in Reinforcement Environments , 1996 .

[27]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[28]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[29]  Stewart W. Wilson,et al.  From Animals to Animats 5. Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior , 1997 .

[30]  Roderic A. Grupen,et al.  A feedback control structure for on-line learning tasks , 1997, Robotics Auton. Syst..

[31]  Ronald E. Parr,et al.  Hierarchical control and learning for markov decision processes , 1998 .

[32]  Kevin P. Murphy,et al.  Learning the Structure of Dynamic Probabilistic Networks , 1998, UAI.

[33]  Daniel E. Koditschek,et al.  Sequential Composition of Dynamically Dexterous Robot Behaviors , 1999, Int. J. Robotics Res..

[34]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[35]  Doina Precup,et al.  Using Options for Knowledge Transfer in Reinforcement Learning , 1999 .

[36]  Thomas G. Dietterich State Abstraction in MAXQ Hierarchical Reinforcement Learning , 1999, NIPS.

[37]  Daniel S. Bernstein,et al.  Reusing Old Policies to Accelerate Learning on New MDPs , 1999 .

[38]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[39]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[40]  Craig Boutilier,et al.  Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[41]  Andrew G. Barto,et al.  Automated State Abstraction for Options using the U-Tree Algorithm , 2000, NIPS.

[42]  E. Deci,et al.  Intrinsic and Extrinsic Motivations: Classic Definitions and New Directions. , 2000, Contemporary educational psychology.

[43]  Andrew G. Barto,et al.  Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[44]  Daphne Koller,et al.  Active Learning for Structure in Bayesian Networks , 2001, IJCAI.

[45]  Bernhard Hengst,et al.  Discovering Hierarchy in Reinforcement Learning with HEXQ , 2002, ICML.

[46]  Andrew G. Barto,et al.  PolicyBlocks: An Algorithm for Creating Useful Macro-Actions in Reinforcement Learning , 2002, ICML.

[47]  Rajeev Alur,et al.  Exploiting Behavioral Hierarchy for Efficient Model Checking , 2002, CAV.

[48]  Tommi S. Jaakkola,et al.  Unsupervised Active Learning in Large Domains , 2002, UAI.

[49]  Balaraman Ravindran,et al.  Model Minimization in Hierarchical Reinforcement Learning , 2002, SARA.

[50]  Thomas G. Dietterich,et al.  Editors. Advances in Neural Information Processing Systems , 2002 .

[51]  Shie Mannor,et al.  Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning , 2002, ECML.

[52]  Carlos Guestrin,et al.  Generalizing plans to new environments in relational MDPs , 2003, IJCAI 2003.

[53]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[54]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[55]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[56]  Shie Mannor,et al.  Dynamic abstraction in reinforcement learning via clustering , 2004, ICML.

[57]  H. Sebastian Seung,et al.  Stochastic policy gradient reinforcement learning on a simple 3D biped , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[58]  Nuttapong Chentanez,et al.  Intrinsically Motivated Learning of Hierarchical Collections of Skills , 2004 .

[59]  John R Anderson,et al.  An integrated theory of the mind. , 2004, Psychological review.

[60]  D. Plaut,et al.  Doing without schema hierarchies: a recurrent connectionist approach to normal and impaired routine sequential action. , 2004, Psychological review.

[61]  Andrew G. Barto,et al.  Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.

[62]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[63]  Alicia P. Wolfe,et al.  Identifying useful subgoals in reinforcement learning by local graph partitioning , 2005, ICML.

[64]  Diego Rasskin-Gutman,et al.  Modularity. Understanding the Development and Evolution of Natural Complex Systems , 2005 .

[65]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[66]  Vishal Soni,et al.  Reinforcement learning of hierarchical skills on the sony aibo robot , 2005, AAAI 2005.

[67]  Bram Bakker,et al.  Hierarchical Reinforcement Learning Based on Subgoal Discovery and Subpolicy Specialization , 2003 .

[68]  Kevin Murphy,et al.  Active Learning of Causal Bayes Net Structure , 2006 .

[69]  Darryl W. Schneider,et al.  Hierarchical control of cognitive processes: switching tasks in sequences. , 2006, Journal of experimental psychology. General.

[70]  Olivier Sigaud,et al.  Learning the structure of Factored Markov Decision Processes in reinforcement learning problems , 2006, ICML.

[71]  Andrew G. Barto,et al.  Causal Graph Based Decomposition of Factored MDPs , 2006, J. Mach. Learn. Res..

[72]  Thomas J. Walsh,et al.  Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.

[73]  R. Lund Advances in Neural Information Processing Systems 17: Proceedings of the 2004 Conference , 2006 .

[74]  Peter Stone,et al.  Value-Function-Based Transfer for Reinforcement Learning Using Structure Mapping , 2006, AAAI.

[75]  Michael L. Littman,et al.  Efficient Structure Learning in Factored-State MDPs , 2007, AAAI.

[76]  Andrew G. Barto,et al.  Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.

[77]  Andrew G. Barto,et al.  Active Learning of Dynamic Bayesian Networks in Markov Decision Processes , 2007, SARA.

[78]  Jude W. Shavlik,et al.  Relational Macros for Transfer in Reinforcement Learning , 2007, ILP.

[79]  Peter Stone,et al.  Transfer Learning via Inter-Task Mappings for Temporal Difference Learning , 2007, J. Mach. Learn. Res..

[80]  Manfred Jaeger,et al.  Proceedings of the 24th Annual International Conference on Machine Learning (ICML 2007) , 2007, ICML 2007.

[81]  Andrew G. Barto,et al.  Skill Characterization Based on Betweenness , 2008, NIPS.

[82]  三嶋 博之 The theory of affordances , 2008 .

[83]  Sriraam Natarajan,et al.  Transfer in variable-reward hierarchical reinforcement learning , 2008, Machine Learning.

[84]  George Konidaris,et al.  Autonomous Robot Skill Acquisition , 2008, AAAI.

[85]  Andrew G. Barto,et al.  Efficient skill learning using abstraction selection , 2009, IJCAI 2009.

[86]  M. Botvinick,et al.  Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective , 2009, Cognition.

[87]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[88]  Sridhar Mahadevan,et al.  Learning Representation and Control in Markov Decision Processes: New Frontiers , 2009, Found. Trends Mach. Learn..

[89]  Lihong Li,et al.  The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning , 2009, ICML '09.

[90]  F.L. Lewis,et al.  Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.

[91]  Jan Peters,et al.  Learning complex motions by sequencing simpler motion templates , 2009, ICML '09.

[92]  Russ Tedrake,et al.  LQR-trees: Feedback motion planning on sparse randomized trees , 2009, Robotics: Science and Systems.

[93]  Autonomously Learning an Action Hierarchy Using a Learned Qualitative State Representation , 2009, IJCAI.

[94]  Andrew G. Barto,et al.  Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.

[95]  P. Langley,et al.  Acquisition of hierarchical reactive skills in a unified cognitive architecture , 2009, Cognitive Systems Research.

[96]  Sridhar Mahadevan,et al.  Basis function construction for hierarchical reinforcement learning , 2010, AAMAS.

[97]  Richard L. Lewis,et al.  Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective , 2010, IEEE Transactions on Autonomous Mental Development.

[98]  Frans C. A. Groen,et al.  Interactive Collaborative Information Systems , 2012, Interactive Collaborative Information Systems.

[99]  Shimon Whiteson,et al.  Switching between Representations in Reinforcement Learning , 2010, Interactive Collaborative Information Systems.

[100]  Andrew G. Barto,et al.  Intrinsically Motivated Hierarchical Skill Learning in Structured Environments , 2010, IEEE Transactions on Autonomous Mental Development.

[101]  Scott Niekum,et al.  Clustering via Dirichlet Process Mixture Models for Portable Skill Discovery , 2011, Lifelong Learning.

[102]  Scott Kuindersma,et al.  Autonomous Skill Acquisition on a Mobile Manipulator , 2011, AAAI.

[103]  Stephen Hart,et al.  Learning Generalizable Control Programs , 2011, IEEE Transactions on Autonomous Mental Development.

[104]  Craig Boutilier,et al.  Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2011, San Francisco, California, USA, August 7-11, 2011 , 2011, AAAI.

[105]  George Konidaris,et al.  Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.

[106]  Andrew G. Barto,et al.  Transfer in Reinforcement Learning via Shared Features , 2012, J. Mach. Learn. Res..

[107]  Scott Kuindersma,et al.  Robot learning from demonstration by constructing skill trees , 2012, Int. J. Robotics Res..

[108]  Bruno Castro da Silva,et al.  Learning Parameterized Skills , 2012, ICML.

[109]  Stephen Hart,et al.  Intrinsically Motivated Affordance Discovery and Modeling , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[110]  Marco Mirolli,et al.  Intrinsically Motivated Learning in Natural and Artificial Systems , 2013 .

[111]  Andrew G. Barto,et al.  Intrinsic Motivation and Reinforcement Learning , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[112]  George Siemens,et al.  Current state and future trends: a citation network analysis of the learning analytics field , 2014, LAK.

[113]  Thanos Stouraitis,et al.  IEEE Circuits and Systems Society , 2018, IEEE Open Journal of Circuits and Systems.