Ligand-Conditioned Protein Sequence Design with L-Caliby
Abstract
Designing proteins that bind small-molecule ligands requires sequence design methods that account for ligand interactions while maintaining robust backbone foldability. LigandMPNN, the current standard for ligand- and structure-conditioned sequence design, achieves high native sequence recovery yet produces sequences with limited AlphaFold3 self-consistency. Caliby has shown that Potts model-based sequence design frameworks have better inductive bias for learning structure-sequence relationships, achieving state-of-the-art AlphaFold2 self-consistency for protein-only design. However, these models were built exclusively for protein backbones. Here, we introduce L-Caliby, a Potts model for ligand-conditioned protein sequence design, which augments the Potts-based sequence design architecture with LigandMPNN's protein-ligand interaction module. On 147 held-out native ligand-protein complexes, L-Caliby outperforms LigandMPNN on ligand placement success rate (11.2% vs. 1.4%, defined by pocket-aligned ligand RMSD ≤ 5 Å) and median scRMSD (6.45 Å vs. 16.34 Å) as evaluated by AlphaFold3 (AF3) in single-sequence mode. On 1,000 de novo protein-ligand complexes generated by RFdiffusion3 with unseen ligands, L-Caliby achieves a ligand placement success rate of 12.8% and scRMSD of 2.22 Å, compared to 10.4% and 6.48 Å for LigandMPNN. These results demonstrate that the Potts model's inductive bias, combined with ligand conditioning, enables more robust and generalizable sequence design for ligand-binding proteins.