Lyrebird: Toward Robust and Generalizable 3D Molecular Conformer Generation via Equivariant Flows
Abstract
Recent generative models for 3D molecular conformer generation have made impressive progress, but data and benchmarks are limited and often fail to evaluate usefulness and trustworthiness as computational chemistry tools. We introduce Lyrebird, a general-purpose model for 3D molecular conformer generation built on the ET-Flow (Equivariant Transformer Flow) architecture, and evaluate generalization by training jointly on Butina-split datasets of drug-like molecules from GEOM-Drugs and GEOM-QM9 and macrocyclic peptides from CREMP. Additionally, we introduce a macrocyclic conformer generation benchmark set: MPCONF196GEN, derived from the MPCONF196 energy benchmark set. We also introduce an energy-based benchmark that evaluates both conformer sampling within the lowest-energy basin and the degree of structural relaxation in generated conformers. Lyrebird matches state-of-the-art ML methods and outperforms ETKDGv3 on coverage and matching metrics for drug-like molecules, and improves performance on macrocycles over models only trained on GEOM-Drugs.