Reinforcement learning for stochastic cooperative multi-agent-systems

We present a distributed variant of Q-learning that allows to learn the optimal cost-to-go function in stochastic cooperative multi-agent domains without communication between the agents.