Cooperative Q-learning: the knowledge sharing issue

A group of cooperative and homogeneous Q-learning agents can cooperate to learn faster and gain more knowledge. In order to do so, each learner agent must be able to evaluate the expertness and the intelligence level of the other agents, and to assess the knowledge and the information it gets from them. In addition, the learner needs a suitable method to properly combine its own knowledge and what it gains from the other agents according to their relative expertness. In this paper, some expertness measuring criteria are introduced. Also, a new cooperative learning method called weighted strategy sharing (WSS) is introduced. In WSS, based on the amount of its teammate expertness, each agent assigns a weight to their knowledge and utilizes it accordingly. WSS and the expertness criteria are tested on two simulated hunter–prey and object-pushing systems.

[1]  Peter Bakker,et al.  Robot see, robot do: An overview of robot imitation , 1996 .

[2]  Jude W. Shavlik,et al.  Creating Advice-Taking Reinforcement Learners , 1998, Machine Learning.

[3]  Richard Alterman,et al.  Multiagent Learning through Collective Memory , 1996 .

[4]  Rüdiger Dillmann,et al.  Learning and Communication in Multi-Agent Systems , 1996, ECAI Workshop LDAIS / ICMAS Workshop LIOME.

[5]  Masayuki Inaba,et al.  Learning by watching: extracting reusable task knowledge from visual observation of human performance , 1994, IEEE Trans. Robotics Autom..

[6]  C. Kaynak,et al.  Techniques for Combining Multiple Learners , 1998 .

[7]  Foster J. Provost,et al.  Scaling Up: Distributed Machine Learning with Cooperation , 1996, AAAI/IAAI, Vol. 1.

[8]  Gillian M. Hayes,et al.  A Robot Controller Using Learning by Imitation , 1994 .

[9]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[10]  Xin Yao,et al.  A cooperative ensemble learning system , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[11]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[12]  Luca Maria Gambardella,et al.  Ant colony system: a cooperative learning approach to the traveling salesman problem , 1997, IEEE Trans. Evol. Comput..

[13]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[14]  Yasuhiro Tanaka,et al.  Speed up reinforcement learning between two agents with adaptive mimetism , 1997, Proceedings of the 1997 IEEE/RSJ International Conference on Intelligent Robot and Systems. Innovative Robotics for Real-World Applications. IROS '97.

[15]  Arthur L. Samuel,et al.  Some studies in machine learning using the game of checkers , 2000, IBM J. Res. Dev..

[16]  Majid Nili Ahmadabadi,et al.  A "constrain and move" approach to distributed object manipulation , 2001, IEEE Trans. Robotics Autom..