In-Person Oral presentation / top 25% paper

Adversarial Diversity in Hanabi

Brandon Cui · Andrei Lupu · Samuel Sokota · Hengyuan Hu · David Wu · Jakob Foerster

Auditorium
[ Abstract ] [ Livestream: Visit Oral 3 Track 1: Reinforcement Learning ]
Tue 2 May 1 a.m. — 1:10 a.m. PDT

Many Dec-POMDPs admit a qualitatively diverse set of ''reasonable'' joint policies, where reasonableness is indicated by symmetry equivariance, non-sabotaging behaviour and the graceful degradation of performance when paired with ad-hoc partners. Some of the work in diversity literature is concerned with generating these policies. Unfortunately, existing methods fail to produce teams of agents that are simultaneously diverse, high performing, and reasonable. In this work, we propose a novel approach, adversarial diversity (ADVERSITY), which is designed for turn-based Dec-POMDPs with public actions. ADVERSITY relies on off-belief learning to encourage reasonableness and skill, and on ''repulsive'' fictitious transitions to encourage diversity. We use this approach to generate new agents with distinct but reasonable play styles for the card game Hanabi and open-source our agents to be used for future research on (ad-hoc) coordination.

Chat is not available.