Learning to coordinate visual behaviors

This dissertation explores the problem of visually guided control. The focus is not on the details of image processing, but on understanding the role that vision plays within the context of an active agent. More specifically, we focus on managing vision in multiple goal tasks. When multiple tasks are addressed simultaneously conflicts arise because of limitations on sensor and effector availability and on computational capacity. This dissertation describes principled ways of handling those conflicts using a decision theoretic approach. The test bed for this work is a graphical human that processes a rendered video stream in order to navigate through a realistically modeled urban environment. The goal of this work is to understand visually guided behavior both as it relates to the engineering of embodied mobile agents and as it relates to the science of human vision. We demonstrate an approach to managing vision for the virtual agent, and also present experimental results illustrating that the same framework can effectively model human eye movement scheduling.

[1]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[2]  Charles E. Thorpe,et al.  Combining multiple goals in a behavior-based architecture , 1995, Proceedings 1995 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human Robot Interaction and Cooperative Robots.

[3]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[4]  O. Hikosaka,et al.  Role of the basal ganglia in the control of purposive saccadic eye movements. , 2000, Physiological reviews.

[5]  O. Hikosaka,et al.  Neural Correlates of Rewarded and Unrewarded Eye Movements in the Primate Caudate Nucleus , 2003, The Journal of Neuroscience.

[6]  Marc Carreras Pérez A proposal of a behavior-based control architecture with reinforcement learning for an autonomous underwater robot , 2003 .

[7]  D. Barraclough,et al.  Prefrontal cortex and decision making in a mixed-strategy game , 2004, Nature Neuroscience.

[8]  R. Bellman Dynamic programming. , 1957, Science.

[9]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[10]  J. C. Crowley,et al.  Saccade Reward Signals in Posterior Cingulate Cortex , 2003, Neuron.

[11]  Ronald C. Arkin,et al.  Robot behavioral selection using q-learning , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  William E. Weihl,et al.  Lottery scheduling: flexible proportional-share resource management , 1994, OSDI '94.

[13]  Ronald C. Arkin,et al.  Motor Schema — Based Mobile Robot Navigation , 1989, Int. J. Robotics Res..

[14]  Sebastian Thrun,et al.  Monte Carlo POMDPs , 1999, NIPS.

[15]  Geoffrey J. Gordon Reinforcement Learning with Function Approximation Converges to a Region , 2000, NIPS.

[16]  Mark Humphreys,et al.  Action selection methods using reinforcement learning , 1997 .

[17]  Stuart J. Russell,et al.  Q-Decomposition for Reinforcement Learning Agents , 2003, ICML.

[18]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[19]  Anne Condon,et al.  On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems , 1999, AAAI/IAAI.

[20]  Martin Buss,et al.  ViGWaM - An Emulation Environment for a Vision Guided Virtual Walking Machine , 2000 .

[21]  Jonas Karlsson,et al.  Learning to Solve Multiple Goals , 1997 .

[22]  Nils J. Nilsson,et al.  Shakey the Robot , 1984 .

[23]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[24]  M. Hayhoe Vision Using Routines: A Functional Account of Vision , 2000 .

[25]  Hiroaki Kitano,et al.  RoboCup: Today and Tomorrow - What we have learned , 1999, Artif. Intell..

[26]  A. Cassandra,et al.  Exact and approximate algorithms for partially observable markov decision processes , 1998 .

[27]  Richard S. Sutton,et al.  Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[28]  Ronald C. Arkin,et al.  Temporal coordination of perceptual algorithms for mobile robot navigation , 1994, IEEE Trans. Robotics Autom..

[29]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[30]  Thomas G. Dietterich,et al.  Two heuristics for solving POMDPs having a delayed need to observe , 2004 .

[31]  Sridhar Mahadevan,et al.  A multiagent reinforcement learning algorithm by dynamically merging markov decision processes , 2002, AAMAS '02.

[32]  Paolo Pirjanian Multiple objective behavior-based control , 2000, Robotics Auton. Syst..

[33]  Paolo Pirjanian,et al.  Behavior Coordination Mechanisms - State-of-the-art , 1999 .

[34]  John Loch,et al.  Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes , 1998, ICML.

[35]  Ron Sun,et al.  Self-segmentation of sequences: automatic formation of hierarchies of sequential behaviors , 2000, IEEE Trans. Syst. Man Cybern. Part B.

[36]  Stefan Schaal,et al.  Reinforcement Learning for Humanoid Robotics , 2003 .

[37]  Kee-Eung Kim,et al.  Solving Very Large Weakly Coupled Markov Decision Processes , 1998, AAAI/IAAI.

[38]  Tommi S. Jaakkola,et al.  Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[39]  J. E. Brown,et al.  Brown , 1975 .

[40]  Hiroaki Kitano,et al.  RoboCup: The Robot World Cup Initiative , 1997, AGENTS '97.

[41]  Rodney A. Brooks,et al.  Intelligence Without Reason , 1991, IJCAI.

[42]  Alessandro Saffiotti,et al.  A Multivalued Logic Approach to Integrating Planning and Control , 1995, Artif. Intell..

[43]  Michael L. Platt,et al.  Neural correlates of decision variables in parietal cortex , 1999, Nature.

[44]  James S. Albus,et al.  New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .

[45]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[46]  Peter Stone,et al.  Scaling Reinforcement Learning toward RoboCup Soccer , 2001, ICML.

[47]  Satinder P. Singh,et al.  How to Dynamically Merge Markov Decision Processes , 1997, NIPS.

[48]  Christopher M. Brown,et al.  Control of selective perception using bayes nets and decision theory , 1994, International Journal of Computer Vision.

[49]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[50]  W. Schultz,et al.  Discrete Coding of Reward Probability and Uncertainty by Dopamine Neurons , 2003, Science.

[51]  Rajesh P. N. Rao,et al.  Embodiment is the foundation, not a level , 1996, Behavioral and Brain Sciences.

[52]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[53]  Michael S. Ambinder,et al.  Change blindness , 1997, Trends in Cognitive Sciences.

[54]  A. L. Yarbus,et al.  Eye Movements and Vision , 1967, Springer US.

[55]  Mark D. Pendrith,et al.  RL-TOPS: An Architecture for Modularity and Re-Use in Reinforcement Learning , 1998, ICML.

[56]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[57]  R. O. Ambrose,et al.  Humanoids Designed to do Work , 2001 .

[58]  András Lörincz,et al.  Module-Based Reinforcement Learning: Experiments with a Real Robot , 1998, Auton. Robots.

[59]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[60]  S. Ullman Visual routines , 1984, Cognition.

[61]  Michael A. Arbib,et al.  The handbook of brain theory and neural networks , 1995, A Bradford book.

[62]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[63]  Milos Hauskrecht,et al.  Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[64]  Julio Rosenblatt,et al.  Optimal Selection of Uncertain Actions by Maximizing Expected Utility , 2000, Auton. Robots.

[65]  Henrik I. Christensen,et al.  Theoretical methods for planning and control in mobile robotics , 1997, Proceedings of 1st International Conference on Conventional and Knowledge Based Intelligent Electronic Systems. KES '97.

[66]  Richard S. Sutton,et al.  Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.

[67]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[68]  Ruzena Bajcsy,et al.  Cooperation of visually guided behaviors , 1993, 1993 (4th) International Conference on Computer Vision.

[69]  T. Christaller,et al.  Dual dynamics: Designing behavior systems for autonomous robots , 1998, Artificial Life and Robotics.

[70]  A. L. I︠A︡rbus Eye Movements and Vision , 1967 .

[71]  J Hoff,et al.  An Architecture for Behavior Coordination Learning , 1995 .

[72]  Alessandro Saffiotti,et al.  Using hierarchical fuzzy behaviors in the RoboCup domain , 2003 .

[73]  Michael I. Jordan,et al.  The Handbook of Brain Theory and Neural Networks , 2002 .

[74]  Joachim Denzler,et al.  Optimal Camera Parameter Selection for State Estimation with Applications in Object Recognition , 2001, DAGM-Symposium.

[75]  Michael J. Swain,et al.  An Architecture for Vision and Action , 1995, IJCAI.

[76]  D. L. Corgan,et al.  King's College , 1867, British medical journal.

[77]  Norman I. Badler,et al.  Real-time virtual humans , 1997, Proceedings The Fifth Pacific Conference on Computer Graphics and Applications.

[78]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[79]  Dana H. Ballard,et al.  A Visual Control Architecture for a Virtual Humanoid , 2001 .

[80]  Jeff Schneider,et al.  Robot Skill Learning Through Intelligent Experimentation , 1995 .

[81]  Ian Horswill,et al.  Visual architecture and cognitive architecture , 1997, J. Exp. Theor. Artif. Intell..

[82]  E. Gat On Three-Layer Architectures , 1998 .

[83]  Takeo Kanade,et al.  A stereo machine for video-rate dense depth mapping and its new applications , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[84]  A. Barto,et al.  Learning and Sequential Decision Making , 1989 .

[85]  TD Ameritrade THE UNIVERSITY OF ROCHESTER , 1998 .

[86]  P. Ramadge,et al.  Supervisory control of a class of discrete event processes , 1987 .

[87]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[88]  O. Hikosaka,et al.  Correlation of primate caudate neural activity and saccade parameters in reward-oriented behavior. , 2003, Journal of neurophysiology.

[89]  Jeff B. Pelz,et al.  Head movement estimation for wearable eye tracker , 2004, ETRA.

[90]  R. Lathe Phd by thesis , 1988, Nature.

[91]  Rodney A. Brooks,et al.  Learning to Coordinate Behaviors , 1990, AAAI.

[92]  Damian M. Lyons David Chapman, Vision, Instruction, and Action , 1995, Artif. Intell..

[93]  Dana H. Ballard,et al.  Visual routines for autonomous driving , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[94]  Demetri Terzopoulos,et al.  Animat vision: Active vision in artificial animals , 1995, Proceedings of IEEE International Conference on Computer Vision.