MARL2Grid-TR: A Multi-Agent RL Benchmark in Power Grid Operations
Abstract
Improving power grid operations is essential for enhancing flexibility and accelerating grid decarbonization. Reinforcement learning (RL) has shown promise in this domain, most notably through the Learning to Run a Power Network (L2RPN) competition series, but prior work has primarily focused on single-agent settings, neglecting the often decentralized, multi-agent nature of grid control. We fill this gap with MARL2Grid-TR, the first multi-agent RL (MARL) benchmark for grid topology and redispatching, developed in collaboration with transmission system operators. Built on RTE France’s high-fidelity simulation platform, our benchmark supports decentralized control across substations and generators, with configurable agent scopes, observability settings, expert-informed heuristics, and safety-critical constraints. The benchmark includes a suite of realistic scenarios that expose key challenges, such as coordination under partial information, long-horizon objectives, and adherence to hard physical constraints. Empirical results show that current MARL methods struggle under these real-world conditions. By providing a standardized, extensible platform, we aim to advance the development of scalable, cooperative, and safe learning algorithms for power grids.