Multimodal Latent Causal VAE for Joint Inference of Gene Regulatory and Protein Interaction Networks
Abstract
Learning directed causal relationships between genes and proteins from observational single-cell data remains a fundamental challenge, as correlation-based methods cannot distinguish cause from effect. We introduce CrossModal-CausalVAE, a variational autoencoder that embeds a structural causal model (SCM) in the latent space, learning directed acyclic graphs (DAGs) within modalities and connected graphs across modalities, with linear decoders that enable interpretable projection of latent causal structure onto the observed gene and protein feature space. Applied to PBMC CITE-seq data, the model learns four interpretable directed networks: intra-modal gene regulatory and protein interaction graphs, plus cross-modal translation and feedback matrices, constrained by biological priors. In silico interventions confirm causal asymmetry in intra-modal and RNA to Protein blocks. Critically, we zero-shot validate the learned causal structure against held-out Perturb-CITE-seq interventional data, achieving RNA→Protein perturbation AUROC of 0.612 vs. 0.534 for the gene-protein correlation baseline, and strong gene-gene perturbation prediction via SCM interventions — demonstrating that the model captures causal structure beyond pairwise correlation from purely observational data. Cross-referencing top-ranked causal pathways with published literature further confirms recovery of established immunological mechanisms. Our work provides a proof of concept that latent causal modeling paired with linear decoding enables interpretable and efficient causal graph inference in a multimodal setting.