Residual Drift Dominates Contradiction in Multi-Turn Constraint Reasoning
Abstract
Interactive reasoning systems can fail silently: the internal state stays consistent while the returned answer quietly violates prior commitments. We call this satisfiable drift and distinguish it from contradiction, where the maintained state itself becomes unsatisfiable. Current evaluations merge both failure modes into a single accuracy number, obscuring which class dominates and where repair effort should concentrate. We formalize the distinction in a solver-instrumented benchmark for multi-turn constraint reasoning with turn-level verification and trigger-conditioned repair. Across 816 test problems and four open-weight models, a repair method that feeds minimal unsatisfiable subsets back to the generator (MUS-Repair) is the strongest in every setting, with significant paired gains after false-discovery correction. The more important finding is what repair leaves behind: residual errors are overwhelmingly drift, not contradiction. Models rarely contradict themselves after repair; they forget. Performance also collapses with turn depth on every model, including the strongest. Contradiction localization is therefore necessary but not sufficient. Reliable multi-turn systems must separately instrument and address both channels.