Skip to yearly menu bar Skip to main content


In-Person Poster presentation / poster accept

RPM: Generalizable Multi-Agent Policies for Multi-Agent Reinforcement Learning

WEI QIU · Xiao Ma · Bo An · Svetlana Obraztsova · shuicheng YAN · Zhongwen Xu

MH1-2-3-4 #102

Keywords: [ Reinforcement Learning ] [ multi-agent reinforcement learning ] [ multi-agent system ]


Abstract:

Despite the recent advancement in multi-agent reinforcement learning (MARL), the MARL agents easily overfit the training environment and perform poorly in evaluation scenarios where other agents behave differently. Obtaining generalizable policies for MARL agents is thus necessary but challenging mainly due to complex multi-agent interactions. In this work, we model the MARL problem with Markov Games and propose a simple yet effective method, called ranked policy memory (RPM), i.e., to maintain a look-up memory of policies to achieve good generalizability. The main idea of RPM is to train MARL policies via gathering massive multi-agent interaction data. In particular, we first rank each agent’s policies by its training episode return, i.e., the episode return of each agent in the training environment; we then save the ranked policies in the memory; when an episode starts, each agent can randomly select a policy from the RPM as the behavior policy. Each agent uses the behavior policy to gather multi-agent interaction data for MARL training. This innovative self-play framework guarantees the diversity of multi-agent interaction in the training data. Experimental results on Melting Pot demonstrate that RPM enables MARL agents to interact with unseen agents in multi-agent generalization evaluation scenarios and complete given tasks. It significantly boosts the performance up to 818% on average.

Chat is not available.