Learning to perceive and act by trial and error

This article considers adaptive control architectures that integrate active sensory-motor systems with decision systems based on reinforcement learning. One unavoidable consequence of active perception is that the agent's internal representation often confounds external world states. We call this phoenomenon Perceptual aliasing and show that it destabilizes existing reinforcement learning algorithms with respect to the optimal decision policy. We then describe a new decision system that overcomes these difficulties for a restricted class of decision problems. The system incorporates a perceptual subcycle within the overall decision cycle and uses a modified learning algorithm to suppress the effects of perceptual aliasing. The result is a control architecture that learns not only how to solve a task but also where to focus its visual attention in order to collect necessary sensory information.

[1]  R. Bellman Dynamic programming. , 1957, Science.

[2]  Richard Fikes,et al.  Learning and Executing Generalized Robot Plans , 1993, Artif. Intell..

[3]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[4]  Manuel Blum,et al.  Toward a Mathematical Theory of Inductive Inference , 1975, Inf. Control..

[5]  James S. Albus,et al.  New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .

[6]  J. Gibson The Ecological Approach to Visual Perception , 1979 .

[7]  James S. Albus,et al.  Brains, behavior, and robotics , 1981 .

[8]  Lashon B. Booker,et al.  Intelligent Behavior as an Adaptation to the Task Environment , 1982 .

[9]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[10]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[11]  S. Ullman Visual routines , 1984, Cognition.

[12]  Charles W. Anderson,et al.  Learning and problem-solving with multilayer connectionist systems (adaptive, strategy learning, neural networks, reinforcement learning) , 1986 .

[13]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[14]  John H. Holland,et al.  Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems , 1995 .

[15]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[16]  R. James Firby,et al.  An Investigation into Reactive Planning in Complex Domains , 1987, AAAI.

[17]  Marcel Schoppers,et al.  Universal Plans for Reactive Robots in Unpredictable Environments , 1987, IJCAI.

[18]  Amy L. Lansky,et al.  Reactive Reasoning and Planning , 1987, AAAI.

[19]  David Chapman,et al.  Pengi: An Implementation of a Theory of Activity , 1987, AAAI.

[20]  John H. Holland,et al.  Induction: Processes of Inference, Learning, and Discovery , 1987, IEEE Expert.

[21]  J. A. Franklin,et al.  Refinement of robot motor skills through reinforcement learning , 1988, Proceedings of the 27th IEEE Conference on Decision and Control.

[22]  Philip E. Agre,et al.  The dynamic structure of everyday life , 1988 .

[23]  Richard S. Sutton,et al.  Sequential Decision Problems and Neural Networks , 1989, NIPS 1989.

[24]  Dana H. Ballard,et al.  Reference Frames for Animate Vision , 1989, IJCAI.

[25]  D. Ballard,et al.  A Role for Anticipation in Reactive Systems that Learn , 1989, ML.

[26]  C. Watkins Learning from delayed rewards , 1989 .

[27]  Charles W. Anderson Tower of Hanoi with Connectionist Networks: Learning New Features , 1989, ML.

[28]  Leslie Pack Kaelbling A Formal Framework for Learning in Embedded Systems , 1989, ML.

[29]  David Chapman,et al.  Penguins Can Make Cake , 1989, AI Mag..

[30]  A. Barto,et al.  Learning and Sequential Decision Making , 1989 .

[31]  David S. Touretzky,et al.  Advances in neural information processing systems 2 , 1989 .

[32]  Tom M. Mitchell,et al.  On Becoming Reactive , 1989, ML.

[33]  Dana H. Ballard,et al.  Reactive behavior, learning, and anticipation , 1989 .

[34]  Mark Drummond,et al.  Situated Control Rules , 1989, KR.

[35]  Marcel Joachim Schoppers,et al.  Representation and automatic synthesis of reaction plans , 1989 .

[36]  John J. Grefenstette,et al.  Incremental Learning of Control Strategies with Genetic algorithms , 1989, ML.

[37]  Andrew W. Moore,et al.  Some experiments in adaptive state-space robotics , 1989 .

[38]  Matthew L. Ginsberg,et al.  Universal Planning: An (Almost) Universally Bad Idea , 1989, AI Mag..

[39]  Michael Hormel,et al.  A Self-organizing Associative Memory System for Control Applications , 1989, NIPS.

[40]  Paul E. Utgoff,et al.  Explaining Temporal Differences to Create Useful Concepts for Evaluating States , 1990, AAAI.

[41]  Richard S. Sutton,et al.  Neural networks for control , 1990 .

[42]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[43]  John J. Grefenstette,et al.  Explanations of Empirically Derived Reactive Plans , 1990, ML.

[44]  John J. Grefenstette,et al.  Simulation-Assisted Learning by Competition: Effects of Noise Differences Between Training Model and Target Environment , 1990, ML.

[45]  Ryszard S. Michalski,et al.  Machine learning: an artificial intelligence approach volume III , 1990 .

[46]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[47]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[48]  Eduardo D. Sontag,et al.  Neural Networks for Control , 1993 .

[49]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[50]  B. Habibi,et al.  Pengi : An Implementation of A Theory of Activity , 1998 .

[51]  T. Poggio,et al.  Networks and the best approximation property , 1990, Biological Cybernetics.

[52]  Allen Newell,et al.  Chunking in Soar: The anatomy of a general learning mechanism , 1985, Machine Learning.

[53]  Richard S. Sutton,et al.  Landmark learning: An illustration of associative search , 1981, Biological Cybernetics.

[54]  Stewart W. Wilson Classifier systems and the animat problem , 2004, Machine Learning.

[55]  John J. Grefenstette,et al.  Learning Sequential Decision Rules Using Simulation Models and Competition , 1990, Machine Learning.

[56]  J. Grefenstette Credit Assignment in Rule Discovery Systems Based on Genetic Algorithms , 2005, Machine Learning.

[57]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[58]  Stewart W. Wilson Classifier Systems and the Animat Problem , 1987, Machine Learning.