Autonomous discovery of temporal abstractions from interaction with an environment

The ability to create and to use abstractions in complex environments, that is, to systematically ignore irrelevant details, is a key reason that humans are effective problem solvers. Although the utility of abstraction is commonly accepted, there has been relatively little research on autonomously discovering or creating useful abstractions. A system that can create new abstractions autonomously can learn and plan in situations that its original designer was not able to anticipate. This dissertation introduces two related methods that allow an agent to autonomously discover and create temporal abstractions from its accumulated experience with its environment. A temporal abstraction is an encapsulation of a complex set of actions into a single higher-level action that allows an agent to learn and plan while ignoring details that appear at finer levels of temporal resolution. The main idea of both methods is to search for patterns that occur frequently within an agent's accumulated successful experience and that do not occur in unsuccessful experiences. These patterns are used to create the new temporal abstractions. The two types of temporal abstractions that our methods create are (1) subgoals and closed-loop policies for achieving them, and (2) open-loop policies, or action sequences, that are useful “macros.” We demonstrate the utility of both types of temporal abstractions in several simulated tasks, including two simulated mobile robot tasks. We use these tasks to demonstrate that the autonomously created temporal abstractions can both facilitate the learning of an agent within a task and can enable effective knowledge transfer to related tasks. As a larger task, we focus on the difficult problem of scheduling the assembly instructions for computers with multiple pipelines in such a manner that the reordered instructions will execute as quickly as possible. We demonstrate that the autonomously discovered action sequences can significantly improve performance of the scheduler and can enable effective knowledge transfer across similar processors. Both methods can extract the temporal abstractions from collections of behavioral trajectories generated by different processes. In particular, we demonstrate that the methods can be effective when applied to collections generated by reinforcement learning agents, heuristic searchers, and human tele-operators.

[1]  Saul Amarel,et al.  On representations of problems of reasoning about actions , 1968 .

[2]  J. Albus A Theory of Cerebellar Function , 1971 .

[3]  Richard Fikes,et al.  Learning and Executing Generalized Robot Plans , 1993, Artif. Intell..

[4]  Earl D. Sacerdoti,et al.  Planning in a Hierarchy of Abstraction Spaces , 1974, IJCAI.

[5]  H A Simon,et al.  The theory of learning by doing. , 1979, Psychological review.

[6]  Richard E. Korf,et al.  Macro-Operators: A Weak Method for Learning , 1985, Artif. Intell..

[7]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[8]  Francis L. Merat,et al.  Introduction to robotics: Mechanics and control , 1987, IEEE J. Robotics Autom..

[9]  Allen Newell,et al.  GPS, a program that simulates human thought , 1995 .

[10]  Azriel Rosenfeld,et al.  Computer Vision , 1988, Adv. Comput..

[11]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[12]  Craig A. Knoblock Learning Abstraction Hierarchies for Problem Solving , 1990, AAAI.

[13]  Rodney A. Brooks,et al.  Elephants don't play chess , 1990, Robotics Auton. Syst..

[14]  Satinder P. Singh,et al.  Transfer of Learning Across Compositions of Sequentail Tasks , 1991, ML.

[15]  Craig A. Knoblock Automatically generating abstractions for problem solving , 1991 .

[16]  Satinder P. Singh,et al.  The Efficient Learning of Multiple Task Sequences , 1991, NIPS.

[17]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[18]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[19]  Lucas Chi Kwong Hui,et al.  Color Set Size Problem with Application to String Matching , 1992, CPM.

[20]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[21]  Satinder P. Singh,et al.  Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.

[22]  Vijaykumar Gullapalli,et al.  Reinforcement learning and its application to control , 1992 .

[23]  Andrew W. Moore,et al.  The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[24]  Satinder Singh,et al.  Learning to Solve Markovian Decision Processes , 1993 .

[25]  Long Ji Lin,et al.  Scaling Up Reinforcement Learning for Robot Control , 1993, International Conference on Machine Learning.

[26]  Leslie Pack Kaelbling,et al.  Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.

[27]  Michael O. Duff,et al.  Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.

[28]  Sebastian Thrun,et al.  Finding Structure in Reinforcement Learning , 1994, NIPS.

[29]  Thomas Dean,et al.  Decomposition Techniques for Planning in Stochastic Domains , 1995, IJCAI.

[30]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[31]  Craig Boutilier,et al.  Exploiting Structure in Policy Construction , 1995, IJCAI.

[32]  Shlomo Zilberstein,et al.  Reinforcement Learning for Mixed Open-loop and Closed-loop Control , 1996, NIPS.

[33]  Pattie Maes,et al.  Emergent Hierarchical Control Structures: Learning Reactive/Hierarchical Relationships in Reinforcement Environments , 1996 .

[34]  Roderic A. Grupen,et al.  A control basis for multilegged walking , 1996, Proceedings of IEEE International Conference on Robotics and Automation.

[35]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[36]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[37]  Tomás Lozano-Pérez,et al.  A Framework for Multiple-Instance Learning , 1997, NIPS.

[38]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[39]  Richard S. Sutton,et al.  Roles of Macro-Actions in Accelerating Reinforcement Learning , 1998 .

[40]  Roderic A. Grupen,et al.  Learning to Coordinate Controllers - Reinforcement Learning on a Control Basis , 1997, IJCAI.

[41]  Doina Precup,et al.  Multi-time Models for Temporally Abstract Planning , 1997, NIPS.

[42]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[43]  Ashwin Ram,et al.  Continuous Case-Based Reasoning , 1997, Artif. Intell..

[44]  Roderic A. Grupen,et al.  A feedback control structure for on-line learning tasks , 1997, Robotics Auton. Syst..

[45]  Milos Hauskrecht,et al.  Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.

[46]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[47]  R. Grupen Learning Robot Control - Using Control Policies as Abstract Actions , 1998 .

[48]  R. Sutton Between MDPs and Semi-MDPs : Learning , Planning , and Representing Knowledge at Multiple Temporal Scales , 1998 .

[49]  Ronald E. Parr,et al.  Hierarchical control and learning for markov decision processes , 1998 .

[50]  Oded Maron,et al.  Learning from Ambiguity , 1998 .

[51]  Doina Precup,et al.  Between MOPs and Semi-MOP: Learning, Planning & Representing Knowledge at Multiple Temporal Scales , 1998 .

[52]  R. Sutton,et al.  Theoretical Results on Reinforcement Learning with Temporally Abstract Behaviors , 1998 .

[53]  Bruce L. Digney,et al.  Learning hierarchical control structures for multiple tasks and changing environments , 1998 .

[54]  Jean-Arcady Meyer,et al.  Learning Hierarchical Control Structures for Multiple Tasks and Changing Environments , 1998 .

[55]  Amy McGovern,et al.  AcQuire-macros: An Algorithm for Automatically Learning Macro-actions , 1998 .

[56]  Chris Drummond,et al.  Composing Functions to Speed up Reinforcement Learning in a Changing World , 1998, ECML.

[57]  Paul R. Cohen,et al.  Discovering Rules for Clustering and Predicting Asynchronous Events , 1998 .

[58]  Mark D. Pendrith,et al.  RL-TOPS: An Architecture for Modularity and Re-Use in Reinforcement Learning , 1998, ICML.

[59]  Mohammed J. Zaki Efficient enumeration of frequent sequences , 1998, CIKM '98.

[60]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[61]  Paul R. Cohen,et al.  Continuous Categories For a Mobile Robot , 1999, AAAI/IAAI.

[62]  Tim Oates,et al.  Identifying distinctive subsequences in multivariate time series by clustering , 1999, KDD '99.

[63]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[64]  Andrew W. Moore,et al.  Multi-Value-Functions: Efficient Automatic Action Hierarchies for Multiple Goal MDPs , 1999, IJCAI.

[65]  Roderic A. Grupen,et al.  A Hybrid Architecture for Learning Robot Control Tasks , 1999 .

[66]  Daniel S. Bernstein,et al.  Reusing Old Policies to Accelerate Learning on New MDPs , 1999 .

[67]  Stephen J. Fink,et al.  The Jalapeño virtual machine , 2000, IBM Syst. J..

[68]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[69]  Doina Precup,et al.  Temporal abstraction in reinforcement learning , 2000, ICML 2000.

[70]  David Andre,et al.  Programmable Reinforcement Learning Agents , 2000, NIPS.

[71]  Roderic A. Grupen,et al.  A hybrid architecture for adaptive robot control , 2000 .

[72]  Andrew G. Barto,et al.  Robot Weightlifting By Direct Policy Search , 2001, IJCAI.

[73]  Wolfram Burgard,et al.  Robust Monte Carlo localization for mobile robots , 2001, Artif. Intell..

[74]  Sridhar Mahadevan,et al.  Approximate planning with hierarchical partially observable Markov decision process models for robot navigation , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[75]  Andrew G. Barto,et al.  Building a Basic Block Instruction Scheduler with Reinforcement Learning and Rollouts , 2002, Machine Learning.

[76]  Roland J. Zito-Wolf,et al.  Learning search control knowledge: An explanation-based approach , 1991, Machine Learning.

[77]  Glenn A. Iba,et al.  A Heuristic Approach to the Discovery of Macro-Operators , 1989, Machine Learning.

[78]  Satinder Singh Transfer of learning by composing solutions of elemental sequential tasks , 2004, Machine Learning.

[79]  Armand Prieditis Machine discovery of effective admissible heuristics , 2004, Machine Learning.

[80]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[81]  Allen Newell,et al.  Chunking in Soar: The anatomy of a general learning mechanism , 1985, Machine Learning.

[82]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.