Poster
in
Workshop: Machine Learning for Genomics Explorations (MLGenX)

LLM-BMC: Resolving Cell Type Ambiguity through Bayesian Integration of Biological Knowledge

Chenxi He

Project Page [ OpenReview]

Abstract

Distinguishing transcriptionally similar cell types remains a core challenge in single-cell RNA sequencing, as standard classifiers struggle when cell types share marker genes. We introduce LLM-Enhanced Bayesian Model Combination (LLM-BMC), a framework that uses large language models to generate structured biological arguments about marker specificity, pathway coherence, and literature support, then integrates these through Bayesian updates to refine classification probabilities. We quantify knowledge effectiveness using normalized information gain ($\lambda$), which directly predicts error reduction: $P_e^{(\text{after})} \approx (1-\lambda) \cdot P_e^{(\text{before})}$. On three scRNA-seq datasets, LLM-BMC improves F1-score by +0.030--0.037, with the largest gains on ambiguous cell types where base classifiers show high uncertainty.

Video

Chat is not available.