DiffAntiSeq: Target-Steered Diffusion in Latent Sequence Space for Antibody Library Design
Abstract
Antibodies are among the most versatile molecules in therapeutic discovery, yet computational antibody library design remains challenging when evolutionary signals from multiple sequence alignments are sparse or unreliable. We present DiffAntiSeq, a controllable diffusion-based generative framework for efficient, target-specific antibody sequence design. DiffAntiSeq performs non-autoregressive denoising in a continuous latent residue embedding space, enabling global sequence refinement beyond the limitations of autoregressive or discrete diffusion models. To steer generation toward desired functional outcomes, we incorporate gradient-based classifier guidance derived from protein language models trained to predict antibody–antigen binding affinity and specificity. We evaluate DiffAntiSeq using large-scale antibody sequence and binding data from the AlphaSeq platform, and apply it to the design of thousands of single-chain variable fragment antibodies targeting a SARS-CoV-2 peptide. Across extensive in silico analyses and structure-based validation, DiffAntiSeq consistently outperforms state-of-the-art machine-learning-driven evolution methods, producing antibody libraries with substantially stronger binding while maintaining meaningful sequence diversity. These results demonstrate that controllable diffusion in continuous latent sequence space provides an effective and scalable paradigm for antibody library design in data-sparse and structure-limited settings.