Fine-tuning Protein Language Models with Deep Mutational Scanning improves Variant Effect Prediction
Aleix Lafita ⋅ Ferran Gonzalez ⋅ Mahmoud Hossam ⋅ Paul Smyth ⋅ Jacob Deasy ⋅ Ari Allyn-Feuer ⋅ Daniel Seaton ⋅ Stephen Young
2024 Poster
in
Workshop: Machine Learning for Genomics Explorations (MLGenX)
in
Workshop: Machine Learning for Genomics Explorations (MLGenX)
Abstract
Protein Language Models (PLMs) have emerged as performant and scalable toolsfor predicting the functional impact and clinical significance of protein-codingvariants, but they still lag experimental accuracy. Here, we present a novel finetuningapproach to improve the performance of PLMs with experimental maps ofvariant effects from Deep Mutational Scanning (DMS) assays using a NormalisedLog-odds Ratio (NLR) head. We find consistent improvements in a held-out proteintest set, and on independent DMS and clinical variant annotation benchmarksfrom ProteinGym and ClinVar. These findings demonstrate that DMS is a promisingsource of sequence diversity and supervised training data for improving theperformance of PLMs for variant effect prediction.
Chat is not available.
Successful Page Load