Discrete Diffusion for Single-Cell Gene Expression Modeling
Sanjukta Bhattacharya ⋅ Christian Gensbigler ⋅ Shaamil Karim Shaw Alem ⋅ Jon Lees
Abstract
Current generative modeling of single-cell transcriptomics relies on continuous latent representations, transforming inherently discrete and sparse gene counts into continuous space. We propose Discrete Cell Models (DCM), a diffusion-based framework that learns cellular representations directly in the discrete domain. Our framework supports both unconditional and conditional generation, allowing for precise modeling of complex biological scenarios such as cell-type-specific transcriptional responses to genetic perturbations. We demonstrate that DCM scales effectively and achieves strong performance against current state-of-the-art methods, including scVI, CPA, STATE, scGPT, and scLDM. On the Dentate Gyrus benchmark, DCM achieves a 5-fold improvement in $MMD^2 RBF$ and a nearly 2-fold improvement in $W_2$ distance, over the leading continuous diffusion baseline (scLDM). On the conditional Replogle perturbation benchmark, DCM sets a new state of the art on $W_2$ distance while remaining competitive on $MMD^2 RBF$. These results establish discrete diffusion as a promising and principled paradigm for generative modeling of single-cell transcriptomics.
Video
Chat is not available.
Successful Page Load