Skip to yearly menu bar Skip to main content

Workshop: Machine Learning for Drug Discovery (MLDD)

Contrastive learning of image- and structure-based representations in drug discovery

Ana Sanchez-Fernandez · Elisabeth Rumetshofer · Sepp Hochreiter · Günter Klambauer

Keywords: [ deep learning ] [ contrastive learning ] [ drug discovery ]


Contrastive learning for self-supervised representation learning has brought a strong improvement to many application areas, such as computer vision and natural language processing. With the availability of large collections of unlabeled data in vision and language, contrastive learning of language and image representations has brought impressive results. The contrastive learning methods CLIP and CLOOB have demonstrated that the learned representations are highly transferable to a large set of diverse tasks when trained on multi-modal data from two different domains. In drug discovery, similar, large, multi-modal datasets comprising both cell-based microscopy images and chemical structures of molecules are available. However, contrastive learning has not been used for this type of multi-modal data in drug discovery, although transferable representations could be a remedy for the time-consuming and cost-expensive label acquisition in this domain. In this work, we present a contrastive learning method for image-based and structure-based representations of small molecules for drug discovery. Our method, Contrastive Leave-One-Out boost for Molecule Encoders (CLOOME), comprises an encoder for microscopy data, an encoder of chemical structures, and a contrastive learning objective. On the benchmark dataset ”Cell Painting”, we demonstrate the ability of our method to learn proficient representations by performing linear probing for activity prediction tasks.

Chat is not available.