Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

70 Results

<<   <   Page 2 of 6   >   >>
Workshop
TaskBench: Benchmarking Large Language Models for Task Automation
Yongliang Shen · Kaitao Song · Xu Tan · Wenqi Zhang · Kan Ren · Siyu Yuan · Weiming Lu · Dongsheng Li · Yueting Zhuang
Workshop
How aligned are different alignment metrics?
Jannis Ahlert · Thomas Klein · Felix Wichmann · Robert Geirhos
Workshop
Lessons learned in the study of representational alignment in physical reasoning
Felix Jedidja Binder · Rahul Venkatesh · Daniel L Yamins · Judith Fan
Workshop
LLF-Bench: Benchmark for Interactive Learning from Language Feedback
Ching-An Cheng · Andrey Kolobov · Dipendra Kumar Misra · Allen Nie · Adith Swaminathan
Workshop
TravelPlanner: A Benchmark for Real-World Planning with Language Agents
Jian Xie · Kai Zhang · Jiangjie Chen · Tinghui Zhu · Renze Lou · Yuandong Tian · Yanghua Xiao · Yu Su
Workshop
FightLadder: A Benchmark for Competitive Multi-Agent Reinforcement Learning
Wenzhe Li · Zihan Ding · Seth Karten · Chi Jin
Workshop
MARS: A Benchmark for Multi-LLM Algorithmic Routing System
Qitian Hu · Jacob Bieker · Xiuyu Li · Nan Jiang · Benjamin Keigwin · Gaurav Ranganath · Kurt Keutzer · Shriyash Upadhyay
Workshop
Sat 6:20 The Protein Engineering Tournament: An Open Science Benchmark for Protein Modeling and Design
Workshop
Can Large Language Models be Good Path Planners? A Benchmark and Investigation on Spatial-temporal Reasoning
Mohamed Aghzal · Erion Plaku · Ziyu Yao
Workshop
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents
Tongxin Yuan · Zhiwei He · Lingzhong Dong · Yiming Wang · Ruijie Zhao · Tian Xia · Lizhen Xu · Binglin Zhou · Li Fangqi · Zhuosheng Zhang · Rui Wang · Gongshen Liu
Workshop
Sat 2:15 Panel - Beyond Benchmarks: Machine Learning for the Planet
Ramona Pelich · Stefan Lang · Nico Lang · Matej Batič
Workshop
Medical Event Data Standard (MEDS): Facilitating Machine Learning for Health
Bert Arnrich · Edward Choi · Jason Fries · Matthew McDermott · Jungwoo Oh · Tom Pollard · Nigam Shah · Ethan Steinberg · Michael Wornow · Robin van de Water