The role of exploration in learning control

Whenever an intelligent agent learns to control an unknown environment, two opposing objectives have to be combined. On the one hand, the environment must be su ciently explored in order to identify a (sub-) optimal controller. For instance, a robot facing an unknown environment has to spend time moving around and acquiring knowledge. On the other hand, the environment must also be exploited during learning, i.e., experience made during learning must also be considered for action selection, if one is interested in minimizing costs of learning. For example, although a robot has to explore its environment, it should avoid collisions with obstacles once it has received some negative reward for collisions. For e cient learning, actions should thus be generated in such a way that the environment is explored and pain is avoided. This fundamental trade-o between exploration and exploitation demands e cient exploration capabilities, maximizing the e ect of learning while minimizing the costs of exploration.

[1]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[2]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[4]  John H. Holland,et al.  Genetic Algorithms and Adaptation , 1984 .

[5]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[6]  Charles W. Anderson,et al.  Learning and problem-solving with multilayer connectionist systems (adaptive, strategy learning, neural networks, reinforcement learning) , 1986 .

[7]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[8]  Richard E. Korf,et al.  Real-time heuristic search: new results , 1988, AAAI 1988.

[9]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[10]  B. Widrow,et al.  The truck backer-upper: an example of self-learning in neural networks , 1989, International 1989 Joint Conference on Neural Networks.

[11]  C. Watkins Learning from delayed rewards , 1989 .

[12]  Richard S. Sutton,et al.  Learning and Sequential Decision Making , 1989 .

[13]  Michael I. Jordan,et al.  Learning to Control an Unstable System with Forward Modeling , 1989, NIPS.

[14]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[15]  Jürgen Schmidhuber,et al.  Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.

[16]  Masazumi Katayama,et al.  Learning Trajectory and Force Control of an Artificial Muscle Arm , 1990, NIPS.

[17]  Michael C. Mozer,et al.  Discovering the Structure of a Reactive Environment by Exploration , 1990, Neural Computation.

[18]  Sebastian Thrun,et al.  Planning with an Adaptive World Model , 1990, NIPS.

[19]  Bartlett W. Mel,et al.  Murphy: A neurally-inspired connectionist approach to learning and performance in vision-based robot motion planning , 1990 .

[20]  Richard S. Sutton,et al.  Integrated Modeling and Control Based on Reinforcement Learning and Dynamic Programming , 1990, NIPS 1990.

[21]  Tom M. Mitchell,et al.  Becoming Increasingly Reactive , 1990, AAAI.

[22]  L. Gordon Kraft,et al.  A summary comparison of CMAC neural network and traditional adaptive control systems , 1990 .

[23]  Yves Chauvin,et al.  Neural Networks Structured for Control Application to Aircraft Landing , 1990, NIPS.

[24]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[25]  Kumpati S. Narendra,et al.  Adaptive control using neural networks , 1990 .

[26]  Richard S. Sutton,et al.  Neural networks for control , 1990 .

[27]  W. T. Miller,et al.  CMAC: an associative neural network alternative to backpropagation , 1990, Proc. IEEE.

[28]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[29]  Andrew G. Barto,et al.  Connectionist learning for control: an overview , 1990 .

[30]  Ming Tan,et al.  Learning a Cost-Sensitive Internal Representation for Reinforcement Learning , 1991, ML.

[31]  Andrew G. Barto,et al.  On the Computational Economics of Reinforcement Learning , 1991 .

[32]  Sridhar Mahadevan,et al.  Scaling Reinforcement Learning to Robotics by Exploiting the Subsumption Architecture , 1991, ML.

[33]  Christopher G. Atkeson,et al.  Using locally weighted regression for robot learning , 1991, Proceedings. 1991 IEEE International Conference on Robotics and Automation.

[34]  Steven D. Whitehead,et al.  Complexity and Cooperation in Q-Learning , 1991, ML.

[35]  Long Ji Lin,et al.  Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.

[36]  Sebastian Thrun,et al.  Active Exploration in Dynamic Environments , 1991, NIPS.

[37]  Benjamin Kuipers,et al.  A robot exploration and mapping strategy based on a semantic hierarchy of spatial representations , 1991, Robotics Auton. Syst..

[38]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[39]  Long Lin,et al.  Memory Approaches to Reinforcement Learning in Non-Markovian Domains , 1992 .

[40]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .