Skip to yearly menu bar Skip to main content


Poster Thu, Apr 23, 2026 • 11:15 AM – 1:45 PM PDT Pavilion 4 P4-#5013

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

Mike Merrill ⋅ Alexander Shaw ⋅ Nicholas Carlini ⋅ Boxuan Li ⋅ Harsh Raj ⋅ Ivan Bercovich ⋅ Lin Shi ⋅ Jeong Shin ⋅ Thomas Walshe ⋅ E. Kelly Buchanan ⋅ Junhong Shen ⋅ Guanghao Ye ⋅ Haowei Lin ⋅ Jason Poulos ⋅ Maoyu Wang ⋅ Marianna Nezhurina ⋅ Di Lu ⋅ Orfeas Menis Mastromichalakis ⋅ Zhiwei Xu ⋅ Zizhao Chen ⋅ Yue Liu ⋅ Robert Zhang ⋅ Leon Liangyu Chen ⋅ Anurag Kashyap ⋅ Jan-Lucas Uslu ⋅ Jeffrey Li ⋅ Jianbo Wu ⋅ Minghao Yan ⋅ Song Bian ⋅ Vedang Sharma ⋅ Ke Sun ⋅ Steven Dillmann ⋅ Akshay Anand ⋅ Andrew Lanpouthakoun ⋅ Bardia Koopah ⋅ Changran Hu ⋅ Etash Guha ⋅ Gabriel Dreiman ⋅ Jiacheng Zhu ⋅ Karl Krauth ⋅ Li Zhong ⋅ Niklas Muennighoff ⋅ Robert Amanfu ⋅ Shangyin Tan ⋅ Shreyas Pimpalgaonkar ⋅ Tushar Aggarwal ⋅ Xiangning Lin ⋅ Xin Lan ⋅ Xuandong Zhao ⋅ Yiqing Liang ⋅ Yuanli Wang ⋅ Zilong (Ryan) Wang ⋅ Changzhi Zhou ⋅ David Heineman ⋅ Hange Liu ⋅ Harsh Trivedi ⋅ John Yang ⋅ Junhong Lin ⋅ Manish Shetty ⋅ Michael Yang ⋅ Nabil Omi ⋅ Negin Raoof ⋅ Shanda Li ⋅ Terry Yue Zhuo ⋅ Wuwei Lin ⋅ Yiwei Dai ⋅ Yuxin Wang ⋅ Wenhao Chai ⋅ Shang Zhou ⋅ Dariush Wahdany ⋅ Ziyu She ⋅ Jiaming Hu ⋅ Zhikang Dong ⋅ Yuxuan Zhu ⋅ Sasha Cui ⋅ Ahson Saiyed ⋅ Arinbjörn Kolbeinsson ⋅ Christopher Rytting ⋅ Ryan Marten ⋅ Yixin Wang ⋅ Jenia Jitsev ⋅ Alex Dimakis ⋅ Andy Konwinski ⋅ Ludwig Schmidt

Abstract

Log in and register to view live content