Active Perception and Reinforcement Learning

This paper considers adaptive control architectures that integrate active sensorimotor systems with decision systems based on reinforcement learning. One unavoidable consequence of active perception is that the agent's internal representation often confounds external world states. We call this phenomenon perceptual aliasing and show that it destabilizes existing reinforcement learning algorithms with respect to the optimal decision policy. A new decision system that overcomes these difficulties is described. The system incorporates a perceptual subcycle within the overall decision cycle and uses a modified learning algorithm to suppress the effects of perceptual aliasing. The result is a control architecture that learns not only how to solve a task but also where to focus its attention in order to collect necessary sensory information.

[1]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[2]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[3]  Charles W. Anderson Tower of Hanoi with Connectionist Networks: Learning New Features , 1989, ML.

[4]  Mark Drummond,et al.  Situated Control Rules , 1989, KR.

[5]  D. Ballard,et al.  A Role for Anticipation in Reactive Systems that Learn , 1989, ML.

[6]  Leslie Pack Kaelbling A Formal Framework for Learning in Embedded Systems , 1989, ML.

[7]  Tom M. Mitchell,et al.  On Becoming Reactive , 1989, ML.

[8]  David Chapman,et al.  Penguins Can Make Cake , 1989, AI Mag..

[9]  Andrew W. Moore,et al.  Some experiments in adaptive state-space robotics , 1989 .

[10]  Dana H. Ballard,et al.  Reactive behavior, learning, and anticipation , 1989 .

[11]  Marcel Joachim Schoppers,et al.  Representation and automatic synthesis of reaction plans , 1989 .

[12]  Richard S. Sutton,et al.  Sequential Decision Problems and Neural Networks , 1989, NIPS 1989.

[13]  A. Barto,et al.  Learning and Sequential Decision Making , 1989 .

[14]  Philip E. Agre,et al.  The dynamic structure of everyday life , 1988 .

[15]  Stewart W. Wilson Hierarchical Credit Allocation in a Classifier System , 1987, IJCAI.

[16]  Marcel Schoppers,et al.  Universal Plans for Reactive Robots in Unpredictable Environments , 1987, IJCAI.

[17]  David Chapman,et al.  Pengi: An Implementation of a Theory of Activity , 1987, AAAI.

[18]  R. James Firby,et al.  An Investigation into Reactive Planning in Complex Domains , 1987, AAAI.

[19]  Amy L. Lansky,et al.  Reactive Reasoning and Planning , 1987, AAAI.

[20]  John H. Holland,et al.  Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems , 1995 .

[21]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[22]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[23]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[24]  Lashon B. Booker,et al.  Intelligent Behavior as an Adaptation to the Task Environment , 1982 .

[25]  Richard Fikes,et al.  Learning and Executing Generalized Robot Plans , 1993, Artif. Intell..

[26]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.