Poster
in
Workshop: Workshop on Logical Reasoning of Large Language Models

KV Cache as a Reasoning Primitive for Long Context Reasoning

Rian Atri

Project Page [ OpenReview]

Abstract

Large language models often produce inconsistent answers across multiple related questions when earlier premises are partially forgotten or distorted in long contexts. We argue this is not only a modeling issue but a working-memory issue: KV cache policy controls which premises remain accessible for attention and thus mediates logical consistency under finite memory. Current practice sits at two extremes: retain everything (wasteful) or evict uniformly (premise-destructive). This ignores decades of memory-hierarchy results on working sets and locality. We synthesize empirical evidence that attention working sets are sparse and structurally constrained (heavy hitters, attention sinks, layer heterogeneity), implying that premise-preserving retention is achievable. We provide a small proof-of-concept cache manager with content-aware retention and show favorable memory–quality tradeoffs on a premise-retrieval stress test (passkey retrieval). We then propose a “consistency bundle” evaluation protocol for measuring cross-question contra- dictions as a function of memory policy. Our conclusion is practical: memory policies should be designed and reported as reasoning controls, not just serving optimizations.

Chat is not available.