Skip to yearly menu bar Skip to main content


(7 events)   Timezone:  
Show all
The 2026 schedule is still incomplete
Toggle Poster Visibility
Oral
Thu Apr 23 11:15 AM -- 11:25 AM (PDT) None
Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts
Zhaomin Wu · Mingzhe Du · See-Kiong Ng · Bingsheng He
[ OpenReview
Oral
Thu Apr 23 11:27 AM -- 11:37 AM (PDT) None
Is it Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort
Xinpeng Wang · Nitish Joshi · Barbara Plank · Rico Angell · He He
[ OpenReview
Oral
Thu Apr 23 11:39 AM -- 11:49 AM (PDT) None
LLMs Get Lost In Multi-Turn Conversation
Philippe Laban · Hiroaki Hayashi · Yingbo Zhou · Jennifer Neville
[ OpenReview
Oral
Thu Apr 23 11:51 AM -- 12:01 PM (PDT) None
How Reliable is Language Model Micro-Benchmarking?
Gregory Yauney · Shahzaib Warraich · Swabha Swayamdipta
[ OpenReview
Oral
Thu Apr 23 12:03 PM -- 12:13 PM (PDT) None
AdAEM: An Adaptively and Automated Extensible Evaluation Method of LLMs' Value Difference
Jing Yao · Shitong Duan · Xiaoyuan Yi · Dongkuan Xu · Peng Zhang · Tun Lu · Ning Gu · Zhicheng Dou · Xing Xie
[ OpenReview
Oral
Thu Apr 23 12:15 PM -- 12:25 PM (PDT) None
What's In My Human Feedback? Learning Interpretable Descriptions of Preference Data
Rajiv Movva · Smitha Milli · Sewon Min · Emma Pierson
[ OpenReview
Oral
Thu Apr 23 12:27 PM -- 12:37 PM (PDT) None
EigenBench: A Comparative Behavioral Measure of Value Alignment
Jonathn Chang · Leonhard Piff · Suvadip Sana · Jasmine Li · Lionel Levine
[ OpenReview