Skip to yearly menu bar Skip to main content


Poster

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

Bill Yuchen Lin ⋅ Yuntian Deng ⋅ Khyathi Chandu ⋅ Abhilasha Ravichander ⋅ Valentina Pyatkin ⋅ Nouha Dziri ⋅ Ronan Le Bras ⋅ Yejin Choi
2025 Poster

Abstract

Video

Chat is not available.