Skip to yearly menu bar Skip to main content


Poster

SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget

Zihao Wang · Bin CUI · Shaoduo Gan
2025 Poster

Abstract

Video

Chat is not available.