Commitment-Aware Axiomatic Coherence: Measuring Non-Vacuous Consistency in LMM Logical Reasoning
Abstract
Large language models (LLMs) are increasingly used for logical tasks, yet they frequently exhibit contradictions across closely related queries. A natural response is to measure logical coherence by checking axioms such as negation consistency. However, we show that coherence can be vacuous: a model can appear consistent by refusing to commit to either a statement or its negation. We propose commitment-aware axiomatic coherence, a lightweight evaluation protocol that complements a standard negation-coherence check with a commitment score measuring how much probability mass the model assigns to entailed vs. refuted outcomes (as opposed to abstention/uncertainty). Using a deterministic log-probability elicitation procedure (YES/NO) and a simple 3-way decision rule (True/False/Uncertain), we evaluate four open LLMs on the public FOLIO v0.0 validation split. Results reveal a clear frontier: some models achieve low contradiction rates primarily by abstaining (low coverage), while others achieve high coverage at the cost of pervasive negation-coherence violations. Our findings argue that reliable logical reasoning evaluation requires reporting both coherence and non-vacuous commitment, not coherence alone.The project is available at https://meherabb.github.io/Commitment/