Selective Enforcement of Order-Invariant Causal Reasoning in Language Models
Abstract
If we accept the statements (A causes B, B causes C), then conclusions we draw from these relations should not depend on the order of presentation. The reordered sequence (B causes C, A causes B) describes the same causal graph and should therefore yield identical downstream judgments. We refer to this requirement as order-invariant causal consistency. Prior work has shown that language models violate this requirement in a variety of contexts, particularly when asked to reason about hypothetical outcomes. We introduce a methodology for selective enforcement of causal constraints in language models, and apply it to this problem. We first construct a narrowly targeted diagnostic -- the Textual Causal Invariance Test (TCIT) -- to isolate failures of order-invariant consistency. We then apply a lightweight training procedure that penalizes order-dependent preferences and reinforces order-invariant reasoning. Implemented on the open-weight Phi-3 model, this intervention raises TCIT accuracy from 59% (modestly above chance) to 98%, without degrading performance on a suite of regression tests. Furthermore, we demonstrate zero-shot transfer to the natural-language CLadder benchmark, yielding statistically significant improvements specifically on Rung-3 (counterfactual) causal reasoning tasks, with no degradation on lower causal rungs. These results demonstrate that violations of order-invariant causal consistency can be isolated and corrected through targeted enforcement of a single structural constraint. More broadly, they suggest that selectively enforcing well-defined causal principles may provide a practical path toward improving causal reasoning in language models.