I2Mole: Interaction-aware Invariant Molecular Learning For Generalizable Property Prediction
Abstract
Molecular interactions are a common phenomenon in physical chemistry field, which could produce unexpected biochemical properties harmful to humans, such as drug-drug interactions. Machine learning has the potential to deliver rapid and accurate predictions. However, the complexity of molecular structures and the diversity of molecular interactions could undermine model prediction accuracy and hinder generalizability. In this context, identifying core invariant substructures (\textit{i.e.}, rationales) has become essential for enhancing interpretability and generalization. Despite notable efforts, existing models often neglect the molecular pairs’ modeling, leading to insufficient capture of interaction relationships. To address these limitations, we propose a novel framework, \textbf{I}nteraction-aware \textbf{I}nvariant \textbf{Mole}cular learning (I2Mole), for generalizable property prediction. I2Mole meticulously models atomic interactions such as hydrogen bonds by initially establishing indiscriminate connections between intermolecular atoms, which are subsequently refined using an improved graph information bottleneck theory tailored for merged graphs. To further enhance model generalization, we construct an environment codebook by environment subgraph of the merged graph. This approach not only could provide noise source for optimizing mutual information but also preserve the integrity of chemical semantic information. By comprehensively leveraging the information inherent in the merged graph, our model accurately captures core substructures and significantly enhances generalization capabilities. Extensive experimental validation demonstrates the efficacy and generalizability of I2Mole. The implementation code is available.