CAUSALPERT: GROUNDING LLM HYPOTHESES IN REGULATORY NETWORKS FOR GENE PERTURBATION PREDICTION
Marc Boubnovski Martell ⋅ Josefa Stoisser ⋅ Lawrence Phillips ⋅ Aditya Misra ⋅ Robert Kitchen ⋅ Jesper Ferkinghoff-Borg ⋅ Jialin Yu ⋅ Philip Torr ⋅ Kaspar Märtens
Abstract
Predicting transcriptional responses to unseen genetic perturbations is essential for understanding gene regulation and prioritizing large-scale perturbation experiments. Existing approaches either rely on static, potentially incomplete knowledge graphs, or prompt language models for functionally similar genes, retrieving associations shaped by symmetric co-occurrence in scientific text rather than directed regulatory logic. We introduce CausalPert, a lightweight framework that encourages LLM agents to generate directed regulatory hypotheses rather than relying solely on functional similarity. Multiple agents independently propose candidate regulators with associated confidence scores; these are aggregated through a consensus mechanism that filters spurious associations, producing weighted neighborhoods for downstream prediction. We evaluate CausalPert on Perturb-seq benchmarks across four human cell lines. For perturbation prediction in low-data regimes ($N=50$ observed perturbations), CausalPert improves Pearson correlation by up to 10.5\% over similarity-based baselines. For experimental design, CausalPert-selected anchor genes outperform standard network centrality heuristics by up to 46\% in well-characterized cell lines.
Video
Chat is not available.
Successful Page Load