Poster
in
Workshop: AI4MAT-ICLR-2026: ICLR 2026 Workshop on AI for Accelerated Materials Design

Solvaformer: Unified Geometric Learning for Solubility-Aware Automated Synthesis

Jonathan Broadbent ⋅ Michael Bailey ⋅ Mingxuan Li ⋅ Abhishek Paul ⋅ Louis de Lescure ⋅ Paul Chauvin ⋅ Lorenzo Anele ⋅ Yasser Jangjou ⋅ Sven Jager

Project Page [ OpenReview]

Abstract

Accurate prediction of small molecule solubility requires balancing physical fidelity with computational scalability. While geometric deep learning offers strong inductive biases for molecular systems, applying full SE(3)-equivariance to dynamic multi-component systems can introduce substantial computational overhead. We introduce Solvaformer, a graph transformer for solubility prediction that selectively grounds interactions in geometry. The architecture applies SE(3)-equivariant attention to rigid intramolecular structure, while modeling fluid intermolecular interactions through computationally efficient scalar attention. We train Solvaformer in a multi-task setting on a combined dataset of quantum-mechanical calculations (CombiSolv-QM) and experimental measurements (BigSolDB~2.0). Solvaformer demonstrates strong performance, approaching the DFT-based baseline while remaining end-to-end and scalable. We also compare against a simpler MPNN augmented with machine-learning interatomic potential (MLIP)-derived partial charges, which achieves slightly better predictive accuracy. This suggests that for scalar solubility prediction, high-quality electronic descriptors can provide an effective alternative to explicit equivariant processing. Nevertheless, Solvaformer remains the best-performing end-to-end model that does not rely on external feature-generation pipelines, and its attention maps retain chemically meaningful interpretability, including the ability to distinguish intra- from intermolecular hydrogen bonding. These results highlight two practical strategies for scalable solution-phase modeling: explicit geometric learning within the architecture, and invariant prediction supported by physics-informed descriptors.

Chat is not available.