Skip to yearly menu bar Skip to main content


Poster

ExpertLongBench: Benchmarking Language Models on Expert-Level Long-Form Generation Tasks with Structured Checklists

Jie Ruan · Inderjeet Nair · Shuyang Cao · Amy Liu · Sheza Munir · Micah Pollens-Dempsey · Yune-Ting Chiang · Lucy Kates · Nicholas David · Sihan Chen · Ruxin Yang · Yuqian Yang · Jihyun Gump · Tessa Bialek · Vivek Sankaran · Margo Schlanger · Lu Wang

Abstract

Log in and register to view live content