Skip to yearly menu bar Skip to main content


ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code

Xiangru Tang · Yuliang Liu · Zefan Cai · Daniel Shao · Junjie Lu · Yichi Zhang · Zexuan Deng · Helan Hu · Kaikai An · Ruijun Huang · Shuzheng Si · Chen Sheng · Haozhe Zhao · Liang Chen · Tianyu Liu · Yujia Qin · Wangchunshu Zhou · Yilun Zhao · Zhiwei Jiang · Baobao Chang · Arman Cohan · Mark Gerstein

Abstract

Video

Chat is not available.