Spotlight
in
Workshop: Learning Meaningful Representations of Life (LMRL) Workshop @ ICLR 2026

A Transcriptomic Benchmark for Foundation Models in Immunology and Inflammation Drug Development

Karim Kanbi ⋅ Yannis Cattan ⋅ Pierre Marschall ⋅ Matthew Corney ⋅ Aziz Fouché ⋅ Vincent Bouget ⋅ Julien Duquesne

Project Page [ OpenReview]

Abstract

Foundation models for transcriptomics are increasingly evaluated on technical metrics disconnected from drug development. We introduce an immunology and inflammation (I&I) benchmark of 35 tasks across 8 diseases, organized along the drug development pipeline: target discovery, preclinical translation, and clinical applications. Tasks span treatment response, clinical severity, molecular perturbations, and patient endotypes, with cross-species, cross-disease, and cross-platform transfer to test translational generalization. Patient sample sizes range from 9 to 713, reflecting data-limited regimes typical of early clinical research. We evaluate general-purpose and domain-specific foundation models against statistical baselines. Foundation models achieve the largest gains on translational tasks (perturbation prediction and cross-species transfer) where baselines fail. Treatment outcome prediction and patient stratification also favor foundation models, while clinical severity prediction remains competitive with feature-selected regression. A domain-specific model (EVA) pretrained on I&I data outperforms general-purpose models across most task categories. Benchmark performance improves with pretraining steps without saturating, suggesting it can serve as a diagnostic for model development.

Video

Chat is not available.