Revisiting Causal Reasoning in Language Models through Controlled Synthetic Worlds
Abstract
Evidence on whether LLMs can reason causally remains mixed, partly because existing benchmarks either allow retrieval-based shortcuts from pretraining or rely on in-context synthetic stories that are weakly aligned with how models acquire knowledge. We present a controlled synthetic-world benchmark that mirrors LLMs’ training setting: we generate a causal world with known DAG structure and Boolean mechanisms, textualize it into demonstrations, and fine-tune LLMs before evaluat- ing them on three task families (simple prediction, L1 associational reasoning, and L2 interventional reasoning). Unlike prior benchmarks, our framework provides training observations from a structural causal model, enabling identification of specific causal reasoning abilities as the training dataset mix is changed. Across experiments, models learn individual causal mechanisms and can generalize to shifted distributions when some examples from those distributions are seen during training. However, they struggle to compose novel causal chains, generalize to new scenario structures, and transfer knowledge across related tasks. These results suggest that current LLMs internalize local causal information without forming an accurate internal causal model. Our results help explain prior mixed findings: current LLMs, trained on large and diverse training data, can achieve improved performance on many benchmarks, but systematic generalization beyond seen distributions remains limited.