CausalSim: Counterfactual Implication Inversion as a Logical Consistency Stress Test for Large Language Models
Abstract
Large language models (LLMs) achieve strong performance on reasoning benchmarks, yet their structural logical consistency remains insufficiently understood. In particular, it is unclear whether models preserve valid implication direction when logical structure is minimally inverted while surface semantics remain nearly identical. We introduce CausalSim, a benchmark for evaluating counterfactual directional consistency as a stress test of logical reasoning in LLMs. The benchmark consists of paired implication hypotheses (A → B vs. B → A) that isolate sensitivity to implication reversal as a minimal structural perturbation. We propose two evaluation metrics: the Causal Advantage Index (CAI), measuring performance asymmetry under inversion, and Balanced-CAI, capturing cross-prompt logical consistency beyond raw accuracy. Across six instruction-tuned LLMs, we observe systematic implication-direction asymmetries, demonstrating that high forward-direction accuracy does not guarantee structural logical robustness. Our findings position implication inversion as a minimal yet diagnostic probe of logical reasoning reliability in modern LLMs.