ICLR Poster Data Distillation for extrapolative protein design through exact preference optimization

Poster

Data Distillation for extrapolative protein design through exact preference optimization

Mostafa Karimi · Sharmi Banerjee · Tommi Jaakkola · Bella Dubrov · Shang Shang · Ron Benson

Hall 3 + Hall 2B #597

[ Abstract ]

Sat 26 Apr midnight PDT — 2:30 a.m. PDT

Abstract:

The goal of protein design typically involves increasing fitness (extrapolating) beyond what is seen during training (e.g., towards higher stability, stronger binding affinity, etc.). State-of-the-art methods assume that one can safely steer proteins towards such extrapolated regions by learning from pairs alone. We hypothesize that noisy training pairs are not sufficiently informative to capture the fitness gradient and that models learned from pairs specifically may fail to capture three-way relations important for search, e.g., how two alternatives fair relative to a seed. Building on the success of preference alignment models in large language models, we introduce a progressive search method for extrapolative protein design by directly distilling into the model relevant triplet relations. We evaluated our model's performance in designing AAV and GFP proteins and demonstrated that the proposed framework significantly improves effectiveness in extrapolation tasks.

Live content is unavailable. Log in and register to view live content