Synthesis-constrained discrete diffusion for ionizable lipid generation
Abstract
Ionizable lipids are the critical component of lipid nanoparticles for mRNA delivery, yet their discovery remains bottlenecked by library enumeration. Existing machine learning approaches can rank pre-defined candidates but cannot generate novel structures. We introduce synthesis-constrained diffusion, the first deep generative model for ionizable lipids, embedding combinatorial chemistry constraints directly into scaffold-conditioned generation. Our proof of concept enforces Ugi scaffold integrity by construction: core bonds formed by the reaction mechanism are fixed throughout diffusion, while region-aware noise distributions capture the distinct chemistry of ionizable heads versus lipophilic tails. A three-stage curriculum (pretraining on drug-like molecules, domain adaptation on virtual lipids, and property-conditioned fine-tuning) enables learning from limited experimental data. Demonstrating this framework on Ugi-based lipid synthesis, 99% of generated samples are chemically valid with intact scaffolds and 62% are novel. The top candidate achieves 2× higher predicted transfection than the training mean (in silico).