Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Gamification and Multiagent Solutions

V-Learning -- A Simple, Efficient, Decentralized Algorithm for Multiagent RL

Chi Jin · Qinghua Liu · Yuanhao Wang · Tiancheng Yu


Abstract: A major challenge of multiagent reinforcement learning (MARL) is \emph{the curse of multiagents}, where the size of the joint action space scales exponentially with the number of agents. This remains to be a bottleneck for designing efficient MARL algorithms even in a basic scenario with finitely many states and actions. This paper resolves this challenge for the model of episodic Markov games. We design a new class of fully decentralized algorithms---V-learning, which provably learns Nash equilibria (in the two-player zero-sum setting), correlated equilibria and coarse correlated equilibria (in the multiplayer general-sum setting) in a number of samples that only scales with $\max_{i\in[m]} A_i$, where $A_i$ is the number of actions for the $i\th$ player. This is in sharp contrast to the size of the joint action space which is $\prod_{i=1}^m A_i$.V-learning (in its basic form) is a new class of single-agent RL algorithms that convert any adversarial bandit algorithm with suitable regret guarantees into an RL algorithm. Similar to the classical Q-learning algorithm, it performs incremental updates to the value functions. Different from Q-learning, it only maintains the estimates of V-values instead of Q-values. This key difference allows V-learning to achieve the claimed guarantees in the MARL setting by simply letting all agents run V-learning independently.

Chat is not available.