Poster
in
Workshop: Generative and Experimental Perspectives for Biomolecular Design
Evaluating predictive patterns of antigen specific B cells by single cell transcriptome and antibody repertoire sequencing
Lena Erlach · Raphael Kuhn · Andreas Agrafiotis · Danielle Shlesinger · Alexander Yermanos · Sai Reddy
The field of antibody drug discovery relies substantially on extensive experimental screening of B cells from immunized animals. Machine learning (ML)-guided prediction of antigen-specific B cells offers the potential to accelerate antibody drug discovery, however this requires sufficient labeled training data. Addressing this challenge, our study focuses on antigen specificity prediction using a novel dataset of B cells with single-cell transcriptome and antibody repertoire sequencing. We identify key patterns in gene expression (GEX) indicative of antigen specificity and elucidate the sequence diversity distribution of antigen-specific antibody sequences in immune repertoire data. We evaluate linear (Logistic Regression), non-linear (Support Vector Classification) and ensemble-based (Random Forest, Gradient Boosting) models trained on different feature combinations of GEX and antibody sequence. Additionally, transfer learning approaches using features generated from ESM-2, a general protein language model (PLM), as well as from AntiBERTy, an antibody specific PLM, were evaluated as inputs to these models. Our findings reveal that GEX-based models demonstrate superior performance in specificity predictions with F1 scores up to 0.939 compared to antibody sequence-based models, highlighting the intricate nature of immune repertoire modeling. Contrary to our expectations, using PLM features did not enhance predictive accuracy. Our research contributes to the computational discovery of antibody therapeutics, offering insights into B cell biology and serving as dataset contribution to the development of ML approaches in this field.