Do LLMs Benefit From Their Own Words?
Abstract
In multi-turn conversations, large language models typically condition on the full dialogue transcript: both past user prompts and assistant responses. We revisit this design choice by comparing full-context prompting against three alternative, substantially-reduced context configurations. Analyzing in-the-wild multi-turn conversations across three open reasoning and one state-of-the-art model, we find that response quality can often be maintained with substantially less context: frequently achieving comparable performance with up to a 10x reduction in context length. To understand this result, we observe that a substantial fraction of user turns (36.4%) in multi-turn conversations are self-contained and that many follow-up requests can be addressed by seeing only the immediately preceding turn or the past user turns alone. Furthermore, we find that when models condition on their own past responses, this can lead to context pollution, a phenomenon in which reasoning errors, hallucinations, or stylistic artifacts cascade across turns. Motivated by these findings, we design a context-filtering approach that selectively omits the assistant-side history. Taken together, these findings suggest the benefits of moving away from storing full conversation histories in context, toward keeping the active context window short and clean.