Skip to yearly menu bar Skip to main content


Oral
in
Workshop: How Far Are We From AGI

DEFT: FLASH TREE-ATTENTION WITH IO-AWARENESS FOR EFFICIENT TREE-SEARCH-BASED LLM INFERENCE

Jinwei Yao ⋅ Kexun Zhang ⋅ Kaiqi Chen ⋅ Jiaxuan You ⋅ Zeke Wang ⋅ Binhang Yuan ⋅ Tao Lin

Abstract

Chat is not available.