Behavioral building blocks for autonomous agents: description, identification, and learning

The broad problem I address in this dissertation is design of autonomous agents that can efficiently learn how to achieve desired behaviors in large, complex environments. I focus on one essential design component: the ability to form new behavioral units, or skills, from existing ones. I propose a characterization of a useful class of skills in terms of general properties of an agent's interaction with its environment—in contrast to specific properties of a particular environment—and I introduce methods that can be used to identify and acquire such skills autonomously.

[1]  R. Bellman,et al.  Dynamic Programming and Markov Processes , 1960 .

[2]  Saul Amarel,et al.  On representations of problems of reasoning about actions , 1968 .

[3]  David G. Stork,et al.  Pattern Classification , 1973 .

[4]  Laurent Siklóssy,et al.  The Role of Preprocessing in Problem Solving Systems , 1977, IJCAI.

[5]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[6]  L. Freeman Centrality in social networks conceptual clarification , 1978 .

[7]  Richard E. Korf,et al.  Macro-Operators: A Weak Method for Learning , 1985, Artif. Intell..

[8]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[9]  John Scott What is social network analysis , 2010 .

[10]  C. Watkins Learning from delayed rewards , 1989 .

[11]  Glenn A. Iba,et al.  A heuristic approach to the discovery of macro-operators , 2004, Machine Learning.

[12]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[13]  Stewart W. Wilson,et al.  A Possibility for Implementing Curiosity and Boredom in Model-Building Neural Controllers , 1991 .

[14]  Andrew B. Kahng,et al.  New spectral methods for ratio cut partitioning and clustering , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[15]  Sebastian Thrun,et al.  Efficient Exploration In Reinforcement Learning , 1992 .

[16]  C. Atkeson,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[17]  Richard M. Leahy,et al.  An Optimal Graph Theoretic Approach to Data Clustering: Theory and Its Application to Image Segmentation , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  J. Peng,et al.  Efficient Learning and Planning Within the Dyna Framework , 1993, IEEE International Conference on Neural Networks.

[19]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[20]  Michael O. Duff,et al.  Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.

[21]  Sebastian Thrun,et al.  Finding Structure in Reinforcement Learning , 1994, NIPS.

[22]  S. Wasserman,et al.  Social Network Analysis: Computer Programs , 1994 .

[23]  S. Hochreiter,et al.  REINFORCEMENT DRIVEN INFORMATION ACQUISITION IN NONDETERMINISTIC ENVIRONMENTS , 1995 .

[24]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[25]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[26]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[27]  Richard S. Sutton,et al.  Roles of Macro-Actions in Accelerating Reinforcement Learning , 1998 .

[28]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29]  Ronald E. Parr,et al.  Hierarchical control and learning for markov decision processes , 1998 .

[30]  Bruce L. Digney,et al.  Learning hierarchical control structures for multiple tasks and changing environments , 1998 .

[31]  Daniel E. Koditschek,et al.  Sequential Composition of Dynamically Dexterous Robot Behaviors , 1999, Int. J. Robotics Res..

[32]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[33]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[34]  Doina Precup,et al.  Temporal abstraction in reinforcement learning , 2000, ICML 2000.

[35]  Peter Dayan,et al.  Dopamine Bonuses , 2000, NIPS.

[36]  Andrew G. Barto,et al.  Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[37]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[38]  James L. McClelland,et al.  Autonomous Mental Development by Robots and Animals , 2001, Science.

[39]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[40]  Bernhard Hengst,et al.  Discovering Hierarchy in Reinforcement Learning with HEXQ , 2002, ICML.

[41]  Andrew G. Barto,et al.  PolicyBlocks: An Algorithm for Creating Useful Macro-Actions in Reinforcement Learning , 2002, ICML.

[42]  Amy McGovern Autonomous Discovery of Abstractions through Interaction with an Environment , 2002, SARA.

[43]  Doina Precup,et al.  Learning Options in Reinforcement Learning , 2002, SARA.

[44]  Andrew G. Barto,et al.  Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .

[45]  Shie Mannor,et al.  Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning , 2002, ECML.

[46]  Pierre-Yves Oudeyer,et al.  Motivational principles for visual know-how development , 2003 .

[47]  Michael O. Duff,et al.  Design for an Optimal Probe , 2003, ICML.

[48]  AUTOMATED DISCOVERY OF OPTIONS IN REINFORCEMENT LEARNING , 2003 .

[49]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[50]  Shie Mannor,et al.  Dynamic abstraction in reinforcement learning via clustering , 2004, ICML.

[51]  Nuttapong Chentanez,et al.  Intrinsically Motivated Learning of Hierarchical Collections of Skills , 2004 .

[52]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[53]  Andrew W. Moore,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[54]  Andrew G. Barto,et al.  Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.

[55]  Alicia P. Wolfe,et al.  Identifying useful subgoals in reinforcement learning by local graph partitioning , 2005, ICML.

[56]  Andrew G. Barto,et al.  A causal approach to hierarchical decomposition of factored MDPs , 2005, ICML.

[57]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[58]  Sridhar Mahadevan,et al.  Proto-value functions: developmental reinforcement learning , 2005, ICML.

[59]  A. Barto,et al.  Intrinsic Motivation For Reinforcement Learning Systems , 2005 .

[60]  Andrew G. Barto,et al.  An intrinsic reward mechanism for efficient exploration , 2006, ICML.

[61]  Andrea Bonarini,et al.  Self-Development Framework for Reinforcement Learning Agents , 2006 .

[62]  Leslie Pack Kaelbling,et al.  Learning Hierarchical Structure in Policies , 2007, NIPS 2007.

[63]  Vimal Mathew,et al.  Automated Spatio-Temporal Abstraction in Reinforcement Learning , 2007 .

[64]  Sridhar Mahadevan,et al.  Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..

[65]  Andrew G. Barto,et al.  Skill Characterization Based on Betweenness , 2008, NIPS.

[66]  Thomas G. Dietterich,et al.  Automatic discovery and transfer of MAXQ hierarchies , 2008, ICML '08.

[67]  U. Rieder,et al.  Markov Decision Processes , 2010 .