Configuration Perturbation Induces Logical Contradictions Across Related Queries
Raghav Subramaniam
Abstract
Logical reasoning evaluations score responses independently under a single configuration. We ask whether answers to logically related questions remain mutually consistent when the system prompt or chain-of-thought elicitation varies between queries. We introduce a protocol that queries models on 120 question-pairs (deductive, inductive, abductive) under six configurations and checks answer-pairs for logical compatibility, reporting both a same-configuration baseline and a cross-configuration condition to isolate the perturbation effect. Across four models, cross-configuration per-check contradiction rates are roughly double same-configuration baselines ($p < 0.001$ pooled, $\chi^2$ test), confirming that configuration changes induce contradictions beyond those attributable to intrinsic model inconsistency. Abductive pairs are most fragile. Chain-of-thought prompting reduces deductive contradictions but increases abductive ones - decomposition shows CoT both worsens abductive consistency within a fixed configuration and makes it more sensitive to configuration changes. We argue that a model's logical commitments should not shift with surface-level configuration changes, and that cross-query consistency under perturbation is a missing axis in reasoning evaluation.
Chat is not available.
Successful Page Load