Skip to yearly menu bar Skip to main content


Poster

When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently?

Ziang Song · Song Mei · Yu Bai

Keywords: [ Reinforcement learning theory ]


Abstract: Multi-agent reinforcement learning has made substantial empirical progresses in solving games with a large number of players. However, theoretically, the best known sample complexity for finding a Nash equilibrium in general-sum games scales exponentially in the number of players due to the size of the joint action space, and there is a matching exponential lower bound. This paper investigates what learning goals admit better sample complexities in the setting of m-player general-sum Markov games with H steps, S states, and Ai actions per player. First, we design algorithms for learning an ϵ-Coarse Correlated Equilibrium (CCE) in O~(H5SmaximAi/ϵ2) episodes, and an ϵ-Correlated Equilibrium (CE) in O~(H6SmaximAi2/ϵ2) episodes. This is the first line of results for learning CCE and CE with sample complexities polynomial in maximAi. Our algorithm for learning CE integrates an adversarial bandit subroutine which minimizes a weighted swap regret, along with several novel designs in the outer loop. Second, we consider the important special case of Markov Potential Games, and design an algorithm that learns an ϵ-approximate Nash equilibrium within O~(SimAi/ϵ3) episodes (when only highlighting the dependence on S, Ai, and ϵ), which only depends linearly in imAi and significantly improves over the existing efficient algorithm in the ϵ dependence. Overall, our results shed light on what equilibria or structural assumptions on the game may enable sample-efficient learning with many players.

Chat is not available.