On improving experimental binding affinity predictions with synthetic data
Kevin Ryczko ⋅ Phyo Phyo Zin ⋅ Jordan Crivelli-Decker ⋅ Ly Le ⋅ Punit Jha ⋅ Benjamin Shields ⋅ Pablo Lemos ⋅ Sasaank Bandi ⋅ Maarten Van Damme ⋅ Martin Ganahl ⋅ Andrea Bortolato
Abstract
The success of deep learning binding affinity prediction models depends critically on expanding experimental data with reliable synthetic data. We extend the Structurally Augmented IC50 Repository (SAIR) with physics-based computations and present two distinct data splits, SAIR-FEP and SAIR-OOD. With SAIR-FEP, we perform $\approx$80K absolute free energy perturbation calculations (AFEP) and curate two train/test splits to simulate realistic drug discovery scenarios. The free energy of binding and other physics-based computations are then used as either input features. We compare the performance of proteochemometric and state-of-the-art structure-based deep learning models and show that including physics-based features improves predictions, and that the quality of the structure plays a key role in their performance. For SAIR-OOD, we remove SAIR entries that overlap with complexes in public-facing benchmarks and demonstrate that simultaneous training on synthetic and experimental data improves performance on public-facing, experimental benchmarks.
Chat is not available.
Successful Page Load