ELISA: An Interpretable Hybrid Agent for Expression-Grounded Discovery in Single-Cell Genomics
Abstract
We present ELISA, a retrieval-augmented AI agent for interpretable, hypothesis- driven exploration of single-cell RNA sequencing (scRNA-seq) data. ELISA enables natural-language querying of cell populations through a query- conditioned retrieval framework that explicitly integrates semantic biological priors with expression-derived evidence. In semantic mode, cluster-level bi- ological summaries are embedded using BioBERT to align user queries with ontology-supported annotations. Hybrid mode extends this approach by con- structing a query-adaptive expression representation from semantically relevant clusters and combining it with scGPT-derived transcriptional embeddings, prior- itizing cell populations that are both semantically relevant and transcriptionally coherent with the query intent.scGPT mode relies exclusively on transcriptional structure captured in the scGPT latent embedding space, emphasizing genes that dominantly shape expression-derived representations of retrieved clusters, independent of semantic annotations or curated biological knowledge. Finally, discovery mode contrasts dataset-specific expression signals with prior biological knowledge to surface context-shifted gene programs and generate cautious, data- grounded hypotheses. By explicitly separating retrieval, expression evidence, and language-model interpretation, ELISA prioritizes transparency and reproducibil- ity over speculative inference. The system is designed as a human-in-the-loop analytical tool that supports expert reasoning and hypothesis generation rather than fully autonomous biological discovery.