Skip to yearly menu bar Skip to main content


Poster Sat, Apr 25, 2026 • 11:15 AM – 1:45 PM PDT Pavilion 3 P3-#1501

The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution

Junlong Li ⋅ Wenshuo Zhao ⋅ Jian Zhao ⋅ Weihao Zeng ⋅ Haoze Wu ⋅ Xiaochen Wang ⋅ Rui Ge ⋅ Yuxuan Cao ⋅ Yuzhen Huang ⋅ Wei Liu ⋅ Junteng LIU ⋅ Zhaochen Su ⋅ Yiyang Guo ⋅ FAN ZHOU ⋅ Lueyang Zhang ⋅ Juan Michelini ⋅ Xingyao Wang ⋅ Xiang Yue ⋅ Shuyan Zhou ⋅ Graham Neubig ⋅ Junxian He

Abstract

Log in and register to view live content