SNOOPPI: Sequence-Normalized Database of On- and Off-Target Protein-Protein Interactions
Abstract
The set of physical protein-protein interactions (PPIs) realized in a cell defines a functional proteome whose interaction patterns constrain and characterize cellular state. PPIs are therefore central means by which biological processes are executed and therapeutic interventions act. Here, we introduce SNOOPPI, a Sequence-Normalized database of On- and Off-target Protein-Protein Interactions, which represents the first unified dataset of binary PPIs that is isoform, post-translational modification, mutation, and binding site aware. By defining a PPI as a direct, physical interaction between two amino acid sequences, SNOOPPI overcomes several persistent limitations of existing PPI databases. SNOOPPI was curated from the IntAct database, taking full advantage of its experimental metadata and feature annotations to reclassify and uncover new PPIs. The final dataset comprises over 35.2K positive interactions and 5.3K negative interactions. SNOOPPI also retains 834.3K unresolved interactions, explicitly capturing gaps in the experimental literature. Beyond its usefulness as a reference dataset for the scientific community, SNOOPPI has the potential to serve as a high-confidence foundation for sequence-based modeling, benchmarking, and generative design of novel protein perturbations.