PRO-MOF: Policy Optimization with Universal Atomistic Models for Controllable MOF Generation
Abstract
Generating physically stable and novel metal-organic frameworks (MOFs) for inverse design that meet specific performance targets is a significant challenge. Existing generative models often struggle to explore the vast chemical and structural space effectively, leading to suboptimal solutions or mode collapse. To address this, we propose PRO-MOF, a hierarchical reinforcement learning (HRL) framework for controllable MOF generation. Our approach decouples the MOF design process into two policies: a high-level policy for proposing chemical building blocks and a low-level policy for assembling their 3D structures. By converting the deterministic Flow Matching model into a Stochastic Differential Equation (SDE), we enable the low-level policy to perform compelling exploration. The framework is optimized in a closed loop with high-fidelity physical reward signals provided by a pre-trained universal atomistic model (UMA). Furthermore, we introduce a Pass@K Group Relative Policy Optimization (GRPO) scheme that effectively balances exploration and exploitation by rewarding in-group diversity. Experiments on multiple inverse design tasks, such as maximizing CO2 working capacity and targeting specific pore diameters, show that PRO-MOF significantly outperforms existing baselines, including diffusion-based methods and genetic algorithms, in both success rate and the discovery of top-performing materials. Our work demonstrates that hierarchical reinforcement learning combined with a high-fidelity physical environment is a powerful paradigm for solving complex material discovery problems.