Reinforcement learning for robots using neural networks

Reinforcement learning agents are adaptive, reactive, and self-supervised. The aim of this dissertation is to extend the state of the art of reinforcement learning and enable its applications to complex robot-learning problems. In particular, it focuses on two issues. First, learning from sparse and delayed reinforcement signals is hard and in general a slow process. Techniques for reducing learning time must be devised. Second, most existing reinforcement learning methods assume that the world is a Markov decision process. This assumption is too strong for many robot tasks of interest. This dissertation demonstrates how we can possibly overcome the slow learning problem and tackle non-Markovian environments, making reinforcement learning more practical for realistic robot tasks: (1) Reinforcement learning can be naturally integrated with artificial neural networks to obtain high-quality generalization, resulting in a significant learning speedup. Neural networks are used in this dissertation, and they generalize effectively even in the presence of noise and a large of binary and real-valued inputs. (2) Reinforcement learning agents can save many learning trials by using an action model, which can be learned on-line. With a model, an agent can mentally experience the effects of its actions without actually executing them. Experience replay is a simple technique that implements this idea, and is shown to be effective in reducing the number of action executions required. (3) Reinforcement learning agents can take advantage of instructive training instances provided by human teachers, resulting in a significant learning speedup. Teaching can also help learning agents avoid local optima during the search for optimal control. Simulation experiments indicate that even a small amount of teaching can save agents many learning trials. (4) Reinforcement learning agents can significantly reduce learning time by hierarchical learning--they first solve elementary learning problems and then combine solutions to the elementary problems to solve a complex problem. Simulation experiments indicate that a robot with hierarchical learning can solve a complex problem, which otherwise is hardly solvable within a reasonable time. (5) Reinforcement learning agents can deal with a wide range of non-Markovian environments by having a memory of their past. Three memory architectures are discussed. They work reasonably well for a variety of simple problems. One of them is also successfully applied to a nontrivial non-Markovian robot task. The results of this dissertation rely on computer simulation, including (1) an agent operating in a dynamic and hostile environment and (2) a mobile robot operating in a noisy and non-Markovian environment. The robot simulator is physically realistic. This dissertation concludes that it is possible to build artificial agents than can acquire complex control policies effectively by reinforcement learning.

[1]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[2]  R. Bellman,et al.  Dynamic Programming and Markov Processes , 1960 .

[3]  Tom M. Mitchell,et al.  Generalization as Search , 1982, Artif. Intell..

[4]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[5]  Hans P. Moravec,et al.  High resolution maps from wide angle sonar , 1985, Proceedings. 1985 IEEE International Conference on Robotics and Automation.

[6]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[7]  John H. Holland,et al.  Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems , 1995 .

[8]  Mozer,et al.  RAMBOT (Restructuring Associative Memory Based on Training): a connectionist expert system that learns by example. Technical report, October 1985-April 1986 , 1986 .

[9]  Bernardo A. Huberman,et al.  AN IMPROVED THREE LAYER, BACK PROPAGATION ALGORITHM , 1987 .

[10]  Ronald L. Rivest,et al.  Diversity-based inference of finite automata , 1994, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[11]  Paul J. Werbos,et al.  Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[12]  Richard E. Korf,et al.  Planning as Search: A Quantitative Approach , 1987, Artif. Intell..

[13]  Charles W. Anderson,et al.  Strategy Learning with Multilayer Connectionist Representations , 1987 .

[14]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[15]  Bernard Widrow,et al.  Adaptive switching circuits , 1988 .

[16]  Scott E. Fahlman,et al.  An empirical study of learning speed in back-propagation networks , 1988 .

[17]  Dean Pomerleau,et al.  ALVINN: An Autonomous Land Vehicle in a Neural Network , 1988, NIPS.

[18]  Reid Simmons,et al.  Experience with a Task Control Architecture for Mobile Robots , 1989 .

[19]  D. Ballard,et al.  A Role for Anticipation in Reactive Systems that Learn , 1989, ML.

[20]  C. Watkins Learning from delayed rewards , 1989 .

[21]  Richard S. Sutton,et al.  Learning and Sequential Decision Making , 1989 .

[22]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[23]  Michael I. Jordan,et al.  Learning to Control an Unstable System with Forward Modeling , 1989, NIPS.

[24]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[25]  Alexander H. Waibel,et al.  Modular Construction of Time-Delay Neural Networks for Speech Recognition , 1989, Neural Computation.

[26]  C. Atkeson Learning arm kinematics and dynamics. , 1989, Annual review of neuroscience.

[27]  Jürgen Schmidhuber,et al.  Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.

[28]  Rodney A. Brooks,et al.  Learning to Coordinate Behaviors , 1990, AAAI.

[29]  Geoffrey E. Hinton,et al.  Distributed Representations , 1986, The Philosophy of Artificial Intelligence.

[30]  Sebastian Thrun,et al.  Planning with an Adaptive World Model , 1990, NIPS.

[31]  Alexander H. Waibel,et al.  The Tempo 2 Algorithm: Adjusting Time-Delays By Supervised Learning , 1990, NIPS.

[32]  Tom M. Mitchell,et al.  Becoming Increasingly Reactive , 1990, AAAI.

[33]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[34]  David Chapman,et al.  Vision, instruction, and action , 1990 .

[35]  Andrew W. Moore,et al.  Efficient memory-based learning for robot control , 1990 .

[36]  Scott E. Fahlman,et al.  The Recurrent Cascade-Correlation Architecture , 1990, NIPS.

[37]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[38]  Ming Tan,et al.  Learning a Cost-Sensitive Internal Representation for Reinforcement Learning , 1991, ML.

[39]  Michael C. Mozer,et al.  SLUG: A Connectionist Architecture for Inferring the Structure of Finite-State Environments , 1991, Mach. Learn..

[40]  Leslie Pack Kaelbling,et al.  Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[41]  Sridhar Mahadevan,et al.  Scaling Reinforcement Learning to Robotics by Exploiting the Subsumption Architecture , 1991, ML Workshop.

[42]  Christopher G. Atkeson,et al.  Using locally weighted regression for robot learning , 1991, Proceedings. 1991 IEEE International Conference on Robotics and Automation.

[43]  Steven D. Whitehead,et al.  Complexity and Cooperation in Q-Learning , 1991, ML.

[44]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[45]  Long Ji Lin,et al.  Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.

[46]  Long-Ji Lin,et al.  Self-improving reactive agents: case studies of reinforcement learning frameworks , 1991 .

[47]  Craig A. Knoblock Automatically generating abstractions for problem solving , 1991 .

[48]  Ming Tan,et al.  Cost-sensitive robot learning , 1991 .

[49]  Satinder Singh Transfer of learning by composing solutions of elemental sequential tasks , 2004, Machine Learning.

[50]  Alan D. Christiansen,et al.  Automatic acquisition of task theories for robotic manipulation , 1992 .

[51]  J. Millán,et al.  A Reinforcement Connectionist Approach to Robot Path Finding in Non-Maze-Like Environments , 2004, Machine Learning.

[52]  Satinder P. Singh,et al.  Scaling Reinforcement Learning Algorithms by Learning Variable Temporal Resolution Models , 1992, ML Workshop.

[53]  Yolanda Gil,et al.  Acquiring domain knowledge for planning by experimentation , 1992 .

[54]  G. Tesauro Practical Issues in Temporal Difference Learning , 1992 .

[55]  Andrew H. Fagg,et al.  Genetic programming approach to the construction of a neural network for control of a walking robot , 1992, Proceedings 1992 IEEE International Conference on Robotics and Automation.

[56]  Lonnie Chrisman,et al.  Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[57]  Ajay Naresh Jain,et al.  Parsec: a connectionist learning architecture for parsing spoken language , 1992 .

[58]  Satinder P. Singh,et al.  Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.

[59]  Andrew McCallum,et al.  Using Transitional Proximity for Faster Reinforcement Learning , 1992, ML.

[60]  Sebastian Thrun,et al.  Efficient Exploration In Reinforcement Learning , 1992 .

[61]  Sebastian Thrun,et al.  Explanation-Based Neural Network Learning for Robot Control , 1992, NIPS.

[62]  Long Lin,et al.  Memory Approaches to Reinforcement Learning in Non-Markovian Domains , 1992 .

[63]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[64]  Marco Dorigo,et al.  Genetics-based machine learning and behavior-based robotics: a new synthesis , 1993, IEEE Trans. Syst. Man Cybern..

[65]  Dean A. Pomerleau,et al.  Neural Network Perception for Mobile Robot Guidance , 1993 .

[66]  J. Peng,et al.  Efficient Learning and Planning Within the Dyna Framework , 1993, IEEE International Conference on Neural Networks.