Learning to role-switch in multi-robot systems

We present an approach that uses Q-learning on individual robotic agents, for coordinating a mission-tasked team of robots in a complex scenario. To reduce the size of the state space, actions are grouped into sets of related behaviors called roles and represented as behavioral assemblages. A role is a finite state automata such as Forager, where the behaviors and their sequencing for finding objects, collecting them, and returning them are already encoded and do not have to be re-learned. Each robot starts out with the same set of possible roles to play, the same perceptual hardware for coordination, and no contact other than perception regarding other members of the team. Over the course of training, a team of Q-learning robots will converge to solutions that best the performance of a well-designed handcrafted homogeneous team.

[1]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[2]  Maja J. Mataric,et al.  Interference as a Tool for Designing and Evaluating Multi-Robot Controllers , 1997, AAAI/IAAI.

[3]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[4]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[6]  Ronald C. Arkin,et al.  Evaluating the Usability of Robot Programming Toolsets , 1998, Int. J. Robotics Res..

[7]  Ronald C. Arkin,et al.  Spatio-temporal case-based reasoning for behavioral selection , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[8]  R. Arkin,et al.  Behavioral diversity in learning robot teams , 1998 .

[9]  Raffaello D'Andrea,et al.  Big Red: The Cornell Small League Robot Soccer Team , 1999, RoboCup.

[10]  Yaser Al-Onaizan,et al.  On being a teammate: experiences acquired in the design of RoboCup teams , 1999, AGENTS '99.

[11]  Minoru Asada,et al.  Vision-based reinforcement learning for purposive behavior acquisition , 1995, Proceedings of 1995 IEEE International Conference on Robotics and Automation.

[12]  Ronald C. Arkin,et al.  Learning momentum: integration and experimentation , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[13]  Sven Koenig,et al.  Probabilistic Planning for Behavior-Based Robots , 2001, FLAIRS Conference.

[14]  Manuela M. Veloso,et al.  Task Decomposition and Dynamic Role Assignment for Real-Time Strategic Teamwork , 1998, ATAL.

[15]  Maja J. Mataric,et al.  Reward Functions for Accelerated Learning , 1994, ICML.

[16]  Ronald C. Arkin,et al.  Robot behavioral selection using q-learning , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..