Foundation Models as Physical Priors: Decoupling Geometric Reasoning from Small-Molecule Solubility Prediction
Abstract
Scientific foundation models offer a promising path to generalized physical reasoning, yet integrating them into specific property prediction tasks remains an open architectural challenge. We evaluate two paradigms for leveraging geometric foundation models in solution-phase chemistry: (1) an end-to-end strategy, introducing Solvaformer, a hybrid SE(3)-equivariant transformer trained to learn geometric interactions from scratch, and (2) a decoupled strategy, where a pre-trained interatomic potential (AIMNet2) serves as a frozen feature engine for a lightweight scalar network. Evaluating on a massive combined dataset of quantum-mechanical and experimental solubility (BigSolDB 2.0), we find that the decoupled approach outperforms the bespoke end-to-end architecture while offering superior training efficiency. Our results suggest that scientific foundation models are most effective when used as composable physical priors—offloading complex geometric reasoning to specialized pre-trained backbones while allowing downstream models to focus on task-specific correlations. This modular "Simplicity at Scale" paradigm offers a robust blueprint for integrating classical scientific tools with modern deep learning.