Skip to yearly menu bar Skip to main content


Noticing the Watcher: LLM Agents Can Infer CoT Monitoring from Blocking Feedback

Thomas Jiralerspong ⋅ Flemming Kondrup ⋅ Yoshua Bengio

Abstract

Chat is not available.