Countering Deception in Multiagent Reinforcement Learning

ABSTRACT Multiagent Reinfor ement Learning (MRL) is a growing area of resear h. What makes it parti ularly hallenging is the non-stationarity of the target fun tion. Most of the existing work in this area, however, address either stationary environments or self-play. We assume an asymmetri and non-stationary environment where other agents an be of arbitrary dispositions. In parti ular, agents an be maliious that model the algorithm being used by the learner and exploit its loopholes. We propose a simple learner that an naively ounter su h de eptions but at a ost of noisesensitivity. Then we re ne this strategy to maximize its ounter de eption properties while minimizing its sensitivity to noise.