EVA-RNA: A Scaling Cross-Species Transcriptomic Foundation Models for Immunology & Inflammation
Abstract
Recent studies have revealed that transcriptomic foundation models often fail to outperform simple baselines on clinically relevant tasks, suggesting a disconnect between pretraining objectives and useful representations. To bridge this gap, we introduce EVA-RNA, a transformer model pretrained on a curated corpus of over 500k samples spanning human and mouse, including bulk RNA-seq, microarray, and pseudobulked single-cell data, with a focus on Immunology & Inflammation. EVA-RNA exhibits clear power-law scaling across 7M to 300M parameters, with no sign of plateauing, in contrast to prior reports of diminishing returns in single-species models. Also, pretraining improvements consistently translate to downstream performance, as measured by a holistic benchmark spanning drug discovery, preclinical translation, and clinical applications. We finally conduct explainability experiments to explore (i) the concepts in EVA-RNA's representations, (ii) the structure of orthologous genes in latent space, and (iii) the evolution of intrinsic dimensionality across layers and throughout training.