Poster
in
Workshop: Learning Meaningful Representations of Life (LMRL) Workshop @ ICLR 2025

Multi-Modal Representation learning for molecules

Muhammad Arslan Masood ⋅ Markus Heinonen ⋅ Samuel Kaski

Project Page [ OpenReview]

Abstract

Molecular representation learning is a fundamental challenge in AI-driven drug discovery, with traditional unimodal approaches relying solely on chemical structures often failing to capture the biological context necessary for accurate toxicity and activity predictions. To address this, we propose a multimodal representation learning framework that integrates molecular data with biological modalities, including morphological features from Cell Painting assays and transcriptomic profiles from the LINCS L1000 dataset. Unlike traditional approaches that require complete triplets (molecule, morphological, genomic), our model only requires paired data—(molecule-morphological) and (molecule-genomic)—making it more practical and scalable. Our approach leverages contrastive learning to align molecular representations with biological data, even in the absence of fully paired datasets. We evaluate our framework on the ChEMBL20 dataset using linear probing across 1,320 tasks, demonstrating improvements in predictive performance. By incorporating diverse biological modalities, our approach enables more robust and biologically informed molecular representations, enhancing the predictive power of AI models in drug discovery.

Chat is not available.