Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Generative and Experimental Perspectives for Biomolecular Design

Contrastive RNA Representation Learning Through Maximizing Mutual Information Between Splice Isoforms

Philip Fradkin · Ruian Shi · Keren Isaev · Caitlin Harrigan · Quaid Morris · BO WANG · Brendan Frey · Leo J Lee


Abstract:

In the face of rapidly accumulating genomic data, our understanding of the RNA regulatory code remains incomplete. Recent self-supervised methods in other domains have demonstrated the ability to learn rules underlying the data-generating process, such as sentence structure in language. Inspired by this, we extend contrastive learning techniques to genomic data by utilizing functional similarities between sequences generated through alternative splicing and gene duplication. We introduce IsoCLR, a model trained on a novel dataset with a contrastive objective, enabling the learning of generalized RNA isoform representations. We validate representation utility on downstream tasks such as RNA half-life and mean ribosome load prediction. Our pre-training strategy yields competitive results using linear probing across six tasks, along with up to a two-fold increase in Pearson correlation in low-data conditions. Importantly, our exploration of the learned latent space reveals that our contrastive objective yields semantically meaningful representations, underscoring its potential as a valuable initialization technique for RNA property prediction.

Chat is not available.