Poster
in
Workshop: Generative and Experimental Perspectives for Biomolecular Design
p-IgGen: A Paired Antibody Generative Language Model
Oliver Turnbull · Dino Oglic · Charlotte Deane
An effective therapeutic antibody must bind both strongly and specifically to its target while being free from developability issues such as aggregation, polyspecificity, poor expression, or low solubility. A key challenge in antibody drug discovery is designing novel sequences that are free from these developability issues, often arising from the 3D biophysical properties of the antibody. Antibodies consist of two paired chains (Heavy and Light) and both chains and their interaction can be important in determining their developability. Currently, there are no antibody language models capable of generating paired sequences, crucial for fully considering developability. Here, we present p-IgGen, a decoder-only language model for paired heavy-light chain generation. We show that generated sequences are diverse, antibody-like, and show pairing properties found in natural sequences. p-IgGen shows state-of-the-art performance on zero-shot predictive tasks, outperforming much larger models. We also demonstrate how the model can be biased to generate sequences with desired structural properties through finetuning. Here, we bias the model to generate antibodies with 3D biophysical properties that fall within distributions seen in clinical stage therapeutic antibodies.