Skip to yearly menu bar Skip to main content


Deception in Dialogue: Evaluating and Mitigating Deceptive Behavior in Large Language Models

Marwa Abdulhai ⋅ Ryan Cheng ⋅ Aryansh Shrivastava ⋅ Natasha Jaques ⋅ Yarin Gal ⋅ Sergey Levine

Abstract

Log in and register to view live content