Poster
in
Workshop: Integrating Generative and Experimental Platforms for Biomolecular Design

On improving experimental binding affinity predictions with synthetic data

Kevin Ryczko ⋅ Phyo Phyo Zin ⋅ Jordan Crivelli-Decker ⋅ Ly Le ⋅ Punit Jha ⋅ Benjamin Shields ⋅ Pablo Lemos ⋅ Sasaank Bandi ⋅ Maarten Van Damme ⋅ Martin Ganahl ⋅ Andrea Bortolato

Project Page [ OpenReview]

Abstract

The success of deep learning binding affinity prediction models depends critically on expanding experimental data with reliable synthetic data. We extend the Structurally Augmented IC50 Repository (SAIR) with physics-based computations and present two distinct data splits, SAIR-FEP and SAIR-OOD. With SAIR-FEP, we perform $\approx$80K absolute free energy perturbation calculations (AFEP) and curate two train/test splits to simulate realistic drug discovery scenarios. The free energy of binding and other physics-based computations are then used as either input features. We compare the performance of proteochemometric and state-of-the-art structure-based deep learning models and show that including physics-based features improves predictions, and that the quality of the structure plays a key role in their performance. For SAIR-OOD, we remove SAIR entries that overlap with complexes in public-facing benchmarks and demonstrate that simultaneous training on synthetic and experimental data improves performance on public-facing, experimental benchmarks.

Chat is not available.