Poster
in
Workshop: Learning Meaningful Representations of Life (LMRL) Workshop @ ICLR 2025
Interpretable Self-Supervised Prototype Learning for Single-Cell Transcriptomics
Fatemeh S. Hashemi G. · Till Richter · Alejandro Tejada Lapuerta · Lennard Halle · Mohammad Lotfollahi · Fabian Theis
Single-cell transcriptomics is inherently noisy and sparse, posing significant challenges for uncovering underlying biological mechanisms. Addressing this issue requires effective denoising strategies to enhance the reliability of biological interpretation. Self-supervised learning has emerged as a powerful approach for learning robust representations across large single-cell datasets, improving denoising and facilitating more accurate biological insights. In this work, we present scProto, an interpretable self-supervised learning framework that learns prototypes, which are subsequently decoded into metacells—denoised representations that aggregate information from multiple similar cells across datasets. These metacells enhance robustness, mitigate noise, and provide a more stable and biologically meaningful representation of cell states. Beyond denoising, scProto is designed to preserve the structural relationships in the k-nearest neighbor (KNN) graph of the input space while simultaneously removing batch effects through self-supervised prototype learning. The loss function ensures that all cell populations, including rare ones, are well-represented through prototypes. We demonstrate that scProto metacells effectively capture marker genes, leading to improved cell-type distinction. Model performance is evaluated using scGraph metrics, which assess the preservation of cell similarity structures and geometric relationships in the embedding space, where scProto generally outperforms other methods. Additionally, batch effect removal and biological conservation are assessed using scIB metrics, indicating that scProto performs on par with the best-performing models while achieving better preservation of structural relationships in the embedding space.