LLM-BMC: Resolving Cell Type Ambiguity through Bayesian Integration of Biological Knowledge
Chenxi He
Abstract
Distinguishing transcriptionally similar cell types remains a core challenge in single-cell RNA sequencing, as standard classifiers struggle when cell types share marker genes. We introduce LLM-Enhanced Bayesian Model Combination (LLM-BMC), a framework that uses large language models to generate structured biological arguments about marker specificity, pathway coherence, and literature support, then integrates these through Bayesian updates to refine classification probabilities. We quantify knowledge effectiveness using normalized information gain ($\lambda$), which directly predicts error reduction: $P_e^{(\text{after})} \approx (1-\lambda) \cdot P_e^{(\text{before})}$. On three scRNA-seq datasets, LLM-BMC improves F1-score by +0.030--0.037, with the largest gains on ambiguous cell types where base classifiers show high uncertainty.
Video
Chat is not available.
Successful Page Load