Knowledge-Based Multiagent Credit Assignment: A Study on Task Type and Critic Information

Multiagent credit assignment (MCA) is one of the major problems in the realization of multiagent reinforcement learning. Since the environment usually is not intelligent enough to qualify individual agents in a cooperative team, it is very important to develop some methods for assigning individual agents' credits when just a single team reinforcement is available. MCA cannot be solved in general cases, using a single technique. Therefore, our goal in this research is first to present a new view of the problem and second, to introduce a new idea of using agents' knowledge to partially solve MCA. In this research, an approach that is based on agents' learning histories and knowledge is proposed to solve the MCA problem. Knowledge evaluation-based credit assignment (KEBCA) along with certainty, a measure of agents' knowledge, is developed to judge agents' actions and to assign them proper credits. The proposed KEBCA method is general, however; we study it in some simulated extreme cases in order to gain a better insight into MCA problem and to evaluate our approach in such cases. More specifically, we study the effects of task type (and-type and or-type tasks) on solving MCA problem in two cases. In the first case, in addition to the team reinforcement, it is assumed that some extra information at the team level is available. In the second case, such extra information does not exist. In addition, performance of the system is examined in presence of some uncertainties in the environment, modeled as noise on agents' actions. The information content of team reinforcements and assumed extra information are theoretically calculated and discussed. The mathematical calculations confirm the related simulation results.

[1]  Sachiyo Arai,et al.  Multi-agent reinforcement learning for crane control problem: designing rewards for conflict resolution , 1999, Proceedings. Fourth International Symposium on Autonomous Decentralized Systems. - Integration of Heterogeneous Systems -.

[2]  Shigenobu Kobayashi,et al.  Rationality of Reward Sharing in Multi-agent Reinforcement Learning , 1999, PRIMA.

[3]  Sachiyo Arai,et al.  Multi-agent reinforcement learning for planning and scheduling multiple goals , 2000, Proceedings Fourth International Conference on MultiAgent Systems.

[4]  Peter Stone,et al.  Layered learning in multiagent systems - a winning approach to robotic soccer , 2000, Intelligent robotics and autonomous agents.

[5]  Mitsuo Kawato,et al.  Inter-module credit assignment in modular reinforcement learning , 2003, Neural Networks.

[6]  Andrew W. Moore,et al.  Distributed Value Functions , 1999, ICML.

[7]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[8]  Maja J. Mataric,et al.  Using Communication to Reduce Locality in Multi-Robot Learning , 1997, AAAI/IAAI.

[9]  Craig Boutilier,et al.  Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.

[10]  John H. Holland,et al.  Properties of the Bucket Brigade , 1985, ICGA.

[11]  M. N. Ahmadabadi,et al.  Experimental Analysis of Knowledge Based Multiagent Credit Assignment , 2004 .

[12]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  Michael P. Georgeff,et al.  Commitment and Effectiveness of Situated Agents , 1991, IJCAI.

[15]  Wayne Wobcke,et al.  Multi-Agent Reinforcement Learning with Vicarious Rewards , 1999, Electron. Trans. Artif. Intell..

[16]  Majid Nili Ahmadabadi,et al.  A new approach to credit assignment in a team of cooperative Q-learning agents , 2002, IEEE International Conference on Systems, Man and Cybernetics.

[17]  J. W Sander On the Value Distribution of Arithmetic Functions , 1997 .

[18]  Kagan Tumer,et al.  An Introduction to Collective Intelligence , 1999, ArXiv.

[19]  Shigenobu Kobayashi,et al.  Rationality of reward sharing in multi-agent reinforcement learning , 1999, New Generation Computing.

[20]  Sandip Sen,et al.  Learning in multiagent systems , 1999 .

[21]  Pradeep K. Khosla,et al.  The necessity of average rewards in cooperative multirobot learning , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[22]  Sachiyo Arai,et al.  Experience-Based Reinforcement Learning to Acquire Effective Behavior in a Multi-agent Domain , 2000, PRICAI.

[23]  Majid Nili Ahmadabadi,et al.  Distributed form closure for convex planar objects through reinforcement learning with local information , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[24]  John J. Grefenstette,et al.  Credit assignment in rule discovery systems based on genetic algorithms , 1988, Machine Learning.

[25]  Mitsuo Kawato,et al.  Multiple Model-Based Reinforcement Learning , 2002, Neural Computation.