Benchmarking Multi-Agent Reinforcement Learning in Power Grid Operations
Abstract
Improving power grid operations is essential for enhancing flexibility and accelerating grid decarbonization. Reinforcement learning (RL) has shown promise in this domain, most notably through the Learning to Run a Power Network competitions, but prior work has primarily focused on single-agent settings, neglecting the decentralized, multi-agent nature of grid control. We fill this gap with MARL2Grid, the first benchmark for multi-agent RL (MARL) in power grid operations, developed in collaboration with transmission system operators. Built on RTE France’s high-fidelity simulation platform, MARL2Grid supports decentralized control across substations and generators, with configurable agent scopes, observability settings, expert-informed heuristics, and safety-critical constraints. The benchmark includes a suite of realistic scenarios that expose key challenges, such as coordination under partial information, long-horizon objectives, and adherence to hard physical constraints. Empirical results show that current MARL methods struggle under these real-world conditions. By providing a standardized, extensible platform, MARL2Grid aims to advance the development of scalable, cooperative, and safe learning algorithms for power system operations.