Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Integrating Generative and Experimental Platforms for Biomolecular Design

From Minimal Data To Maximal Insight: A Machine Learning Guided Platform For Peptide Discovery

Pouriya Bayat · Spencer Perkins · Sebastian Clancy · Sahil Patel · Richard Yin · Krištof Bozovičar · Idorenyin IWE · Mohammad Simchi · Ilan Zeisler · Serena Singh · Vivian White · Matthew Xie · Sean Palter · Keith Pardee


Abstract:

Peptide biologics represent a promising therapeutic frontier, but their discovery and optimization are often hindered by the requirement for extensive training datasets in machine learning approaches. Here we present Minimal Data Maximal Insight (MDMI), a novel computational method that enables peptide discovery using limited data (~100 sequences). Using a split Green Fluorescent Protein (GFP) system as our model, we develop a sequence-agnostic model with statistical potential scoring and physics-based evaluation to create an ensemble predictive model. This is coupled with a genetic algorithm for sequence optimization. With only one round of screening, we developed a model that yielded novel functional sequences 63% of which exhibited fluorescence. Notably, by analyzing high-activity sequences to identify favorable amino acids at each position, we were able to design peptide variants with more than 50% sequence difference from the wild type -far exceeding the mutation rates present in our training data- while maintaining functionality. By reducing dependency on large datasets, MDMI democratizes access to advanced computational tools for peptide engineering and offers a blueprint for accelerating therapeutic peptide discovery across various applications, from antimicrobials to targeted drug delivery.

Chat is not available.