firstbacksecondback
70 Results
Workshop
|
TaskBench: Benchmarking Large Language Models for Task Automation Yongliang Shen · Kaitao Song · Xu Tan · Wenqi Zhang · Kan Ren · Siyu Yuan · Weiming Lu · Dongsheng Li · Yueting Zhuang |
||
Workshop
|
How aligned are different alignment metrics? Jannis Ahlert · Thomas Klein · Felix Wichmann · Robert Geirhos |
||
Workshop
|
Lessons learned in the study of representational alignment in physical reasoning Felix Jedidja Binder · Rahul Venkatesh · Daniel L Yamins · Judith Fan |
||
Workshop
|
LLF-Bench: Benchmark for Interactive Learning from Language Feedback Ching-An Cheng · Andrey Kolobov · Dipendra Kumar Misra · Allen Nie · Adith Swaminathan |
||
Workshop
|
TravelPlanner: A Benchmark for Real-World Planning with Language Agents Jian Xie · Kai Zhang · Jiangjie Chen · Tinghui Zhu · Renze Lou · Yuandong Tian · Yanghua Xiao · Yu Su |
||
Workshop
|
FightLadder: A Benchmark for Competitive Multi-Agent Reinforcement Learning Wenzhe Li · Zihan Ding · Seth Karten · Chi Jin |
||
Workshop
|
MARS: A Benchmark for Multi-LLM Algorithmic Routing System Qitian Hu · Jacob Bieker · Xiuyu Li · Nan Jiang · Benjamin Keigwin · Gaurav Ranganath · Kurt Keutzer · Shriyash Upadhyay |
||
Workshop
|
Sat 6:20 |
The Protein Engineering Tournament: An Open Science Benchmark for Protein Modeling and Design |
|
Workshop
|
Can Large Language Models be Good Path Planners? A Benchmark and Investigation on Spatial-temporal Reasoning Mohamed Aghzal · Erion Plaku · Ziyu Yao |
||
Workshop
|
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents Tongxin Yuan · Zhiwei He · Lingzhong Dong · Yiming Wang · Ruijie Zhao · Tian Xia · Lizhen Xu · Binglin Zhou · Li Fangqi · Zhuosheng Zhang · Rui Wang · Gongshen Liu |
||
Workshop
|
Sat 2:15 |
Panel - Beyond Benchmarks: Machine Learning for the Planet Ramona Pelich · Stefan Lang · Nico Lang · Matej Batič |
|
Workshop
|
Medical Event Data Standard (MEDS): Facilitating Machine Learning for Health Bert Arnrich · Edward Choi · Jason Fries · Matthew McDermott · Jungwoo Oh · Tom Pollard · Nigam Shah · Ethan Steinberg · Michael Wornow · Robin van de Water |