Skip to yearly menu bar Skip to main content


(6 events)   Timezone:  
Show all
Toggle Poster Visibility
Oral
Fri Apr 25 12:30 AM -- 12:42 AM (PDT) @ Garnet 213-215 None
From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions
Changle Qu · Sunhao Dai · Xiaochi Wei · Hengyi Cai · Shuaiqiang Wang · Dawei Yin · Jun Xu · Ji-Rong Wen
[ OpenReview
Oral
Fri Apr 25 12:42 AM -- 12:54 AM (PDT) @ Garnet 213-215 None
Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Fangyu Lei · Jixuan Chen · Yuxiao Ye · Ruisheng Cao · Dongchan Shin · Hongjin SU · Zhaoqing Suo · Hongcheng Gao · Wenjing Hu · Pengcheng Yin · Victor Zhong · Caiming Xiong · Ruoxi Sun · Qian Liu · Sida Wang · Tao Yu
[ Slides [ OpenReview
Oral
Fri Apr 25 12:54 AM -- 01:06 AM (PDT) @ Garnet 213-215 None
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Terry Yue Zhuo · Minh Chien Vu · Jenny Chim · Han Hu · Wenhao Yu · Ratnadira Widyasari · Imam Nur Bani Yusuf · Haolan Zhan · Junda He · Indraneil Paul · Simon Brunner · Chen GONG · James Hoang · Armel Zebaze · Xiaoheng Hong · Wen-Ding Li · Jean Kaddour · Ming Xu · Zhihan Zhang · Prateek Yadav · Naman Jain · Alex Gu · Zhoujun Cheng · Jiawei Liu · Qian Liu · Zijian Wang · David Lo · Binyuan Hui · Niklas Muennighoff · Daniel Fried · Xiaoning Du · Harm de Vries · Leandro Von Werra
[ OpenReview
Oral
Fri Apr 25 01:06 AM -- 01:18 AM (PDT) @ Garnet 213-215 None
LLM-SR: Scientific Equation Discovery via Programming with Large Language Models
Parshin Shojaee · Kazem Meidani · Shashank Gupta · Amir Barati Farimani · Chandan Reddy
[ OpenReview
Oral
Fri Apr 25 01:18 AM -- 01:30 AM (PDT) @ Garnet 213-215 None
Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models
Andy K Zhang · Neil Perry · Riya Dulepet · Joey Ji · Celeste Menders · Justin Lin · Eliot Jones · Gashon Hussein · Samantha Liu · Donovan Jasper · Pura Peetathawatchai · Ari Glenn · Vikram Sivashankar · Daniel Zamoshchin · Leo Glikbarg · Derek Askaryar · Haoxiang Yang · Aolin Zhang · Rishi Alluri · Nathan Tran · Rinnara Sangpisit · Kenny Oseleononmen · Dan Boneh · Daniel Ho · Percy Liang
[ OpenReview
Oral
Fri Apr 25 01:30 AM -- 01:42 AM (PDT) @ Garnet 213-215 None
AFlow: Automating Agentic Workflow Generation
Jiayi Zhang · Jinyu Xiang · Zhaoyang Yu · Fengwei Teng · XiongHui Chen · Jiaqi Chen · Mingchen Zhuge · Xin Cheng · Sirui Hong · Jinlin Wang · Bingnan Zheng · Bang Liu · Yuyu Luo · Chenglin Wu
[ OpenReview