Governed Self-Improvement for Logical Reasoning: Edit-Time Governance for Developmental Consistency
Abstract
Self-refinement methods enable large language models to improve without retraining, yet they optimize local answers rather than the future reasoner. In logical reasoning, every answer creates longitudinal commitments: paraphrases, negations, implication chains, and premise permutations must remain jointly consistent across developmental time. We present a governance-oriented framework and evaluation lens with proof-of-concept validation on a controlled propositional-logic domain. (1) We frame self-improvement as a commitment-management problem and show that uncontrolled search can increase contradictions even while raising accuracy. (2) We propose GSI-LR (Governed Self-Improvement for Logical Reasoning), a framework combining branch-diverse proposal search, a temporal contradiction graph (TCG) grounded in AGM-style belief revision, an axiomatic validation cascade using symbolic solvers at edit time, and an explicit edit-rights policy. (3) We introduce Developmental Consistency Evaluation (DCE), a protocol measuring family contradiction rate (FCR; lower is better — fewer family contradictions), acceptance precision, delayed regression, rollback burden, and maintenance debt over trajectories rather than snapshots. (4) We validate GSI-LR on a Z3-grounded propositional-logic domain (200 questions, 40 families, 50 edit rounds, 5 seeds), demonstrating that governed development occupies a favorable position on the accuracy–consistency Pareto frontier: it reduces FCR by 8.8% relative to static baselines (FCR 0.675 vs. 0.740, lower is better) while maintaining strict non-regression, whereas unconstrained search achieves perfect accuracy at the cost of increased contradictions (FCR 0.775).