Poster Fri, Apr 24, 2026 • 6:30 AM – 9:00 AM PDT

Why Attention Patterns Exist: A Unifying Temporal Perspective Analysis

Qingyue Yang · Jie Wang · Xing Li · Yinqi Bai · Tong Xialiang · Huiling Zhen · Jianye HAO · Mingxuan Yuan · Bin Li

Abstract

Attention patterns play a crucial role in both training and inference of large language models (LLMs). Prior works have identified individual patterns—such as retrieval heads, sink heads, and diagonal traces—but these observations remain fragmented and lack a unifying explanation. To bridge this gap, we provide a unifying framework to explain the existence of diverse attention patterns by analyzing their underlying mathematical formulations with a temporal continuous perspective. Our work can both deepen the understanding of attention behavior and guide inference acceleration approaches. Specifically, this framework characterizes attention patterns as either predictable patterns, characterized by clear regularities, or unpredictable ones that appear random. Our analysis further reveals that the distinction between them can be explained by variations in query self-similarity across the temporal dimension. Focusing on the predictable patterns, we further provide a detailed mathematical analysis of three representative predictable patterns in terms of the joint effect of queries, keys, and Rotary Positional Embeddings. To validate the framework, we apply it to KV cache compression and LLM pruning tasks. In these experiments, a simple metric inspired by our theory consistently improves performance over baseline methods.