Skip to yearly menu bar Skip to main content


SageAttention2: Efficient Attention with Smoothing Q and Per-thread Quantization

Jintao Zhang ⋅ Haofeng Huang ⋅ Pengle Zhang ⋅ Jia wei ⋅ Jun Zhu ⋅ Jianfei Chen

Abstract

Video

Chat is not available.