论文信息 - Behavioral building blocks for autonomous agents: description, identification, and learning

Behavioral building blocks for autonomous agents: description, identification, and learning

The broad problem I address in this dissertation is design of autonomous agents that can efficiently learn how to achieve desired behaviors in large, complex environments. I focus on one essential design component: the ability to form new behavioral units, or skills, from existing ones. I propose a characterization of a useful class of skills in terms of general properties of an agent's interaction with its environment—in contrast to specific properties of a particular environment—and I introduce methods that can be used to identify and acquire such skills autonomously.

Andrew G. Barto | Özgür Simsek | A. Barto | Özgür Simsek

[1] R. Bellman,et al. Dynamic Programming and Markov Processes , 1960 .

[2] Saul Amarel,et al. On representations of problems of reasoning about actions , 1968 .

[3] David G. Stork,et al. Pattern Classification , 1973 .

[4] Laurent Siklóssy,et al. The Role of Preprocessing in Problem Solving Systems , 1977, IJCAI.

[5] Leonard M. Freeman,et al. A set of measures of centrality based upon betweenness , 1977 .

[6] L. Freeman. Centrality in social networks conceptual clarification , 1978 .

[7] Richard E. Korf,et al. Macro-Operators: A Weak Method for Learning , 1985, Artif. Intell..

[8] Rodney A. Brooks,et al. A Robust Layered Control Syste For A Mobile Robot , 2022 .

[9] John Scott. What is social network analysis , 2010 .

[10] C. Watkins. Learning from delayed rewards , 1989 .

[11] Glenn A. Iba,et al. A heuristic approach to the discovery of macro-operators , 2004, Machine Learning.

[12] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[13] Stewart W. Wilson,et al. A Possibility for Implementing Curiosity and Boredom in Model-Building Neural Controllers , 1991 .

[14] Andrew B. Kahng,et al. New spectral methods for ratio cut partitioning and clustering , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[15] Sebastian Thrun,et al. Efficient Exploration In Reinforcement Learning , 1992 .

[16] C. Atkeson,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[17] Richard M. Leahy,et al. An Optimal Graph Theoretic Approach to Data Clustering: Theory and Its Application to Image Segmentation , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[18] J. Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, IEEE International Conference on Neural Networks.

[19] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[20] Michael O. Duff,et al. Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.

[21] Sebastian Thrun,et al. Finding Structure in Reinforcement Learning , 1994, NIPS.

[22] S. Wasserman,et al. Social Network Analysis: Computer Programs , 1994 .

[23] S. Hochreiter,et al. REINFORCEMENT DRIVEN INFORMATION ACQUISITION IN NONDETERMINISTIC ENVIRONMENTS , 1995 .

[24] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[25] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[26] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[27] Richard S. Sutton,et al. Roles of Macro-Actions in Accelerating Reinforcement Learning , 1998 .

[28] Jitendra Malik,et al. Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29] Ronald E. Parr,et al. Hierarchical control and learning for markov decision processes , 1998 .

[30] Bruce L. Digney,et al. Learning hierarchical control structures for multiple tasks and changing environments , 1998 .

[31] Daniel E. Koditschek,et al. Sequential Composition of Dynamically Dexterous Robot Behaviors , 1999, Int. J. Robotics Res..

[32] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[33] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[34] Doina Precup,et al. Temporal abstraction in reinforcement learning , 2000, ICML 2000.

[35] Peter Dayan,et al. Dopamine Bonuses , 2000, NIPS.

[36] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[37] Shigeo Abe DrEng. Pattern Classification , 2001, Springer London.

[38] James L. McClelland,et al. Autonomous Mental Development by Robots and Animals , 2001, Science.

[39] U. Brandes. A faster algorithm for betweenness centrality , 2001 .

[40] Bernhard Hengst,et al. Discovering Hierarchy in Reinforcement Learning with HEXQ , 2002, ICML.