Skip to yearly menu bar Skip to main content


Poster Fri, Apr 24, 2026 • 11:15 AM – 1:45 PM PDT Pavilion 3 P3-#1014

Distilling to Hybrid Attention Models via KL-Guided Layer Selection

Yanhong Li ⋅ Songlin Yang ⋅ Shawn Tan ⋅ Mayank Mishra ⋅ Rameswar Panda ⋅ Jiawei Zhou ⋅ Yoon Kim

Abstract

Log in and register to view live content