Poster
in
Workshop: Workshop on Logical Reasoning of Large Language Models

Configuration Perturbation Induces Logical Contradictions Across Related Queries

Raghav Subramaniam

Project Page [ OpenReview]

Abstract

Logical reasoning evaluations score responses independently under a single configuration. We ask whether answers to logically related questions remain mutually consistent when the system prompt or chain-of-thought elicitation varies between queries. We introduce a protocol that queries models on 120 question-pairs (deductive, inductive, abductive) under six configurations and checks answer-pairs for logical compatibility, reporting both a same-configuration baseline and a cross-configuration condition to isolate the perturbation effect. Across four models, cross-configuration per-check contradiction rates are roughly double same-configuration baselines ($p < 0.001$ pooled, $\chi^2$ test), confirming that configuration changes induce contradictions beyond those attributable to intrinsic model inconsistency. Abductive pairs are most fragile. Chain-of-thought prompting reduces deductive contradictions but increases abductive ones - decomposition shows CoT both worsens abductive consistency within a fixed configuration and makes it more sensitive to configuration changes. We argue that a model's logical commitments should not shift with surface-level configuration changes, and that cross-query consistency under perturbation is a missing axis in reasoning evaluation.

Chat is not available.