Skip to yearly menu bar Skip to main content


Neural Chameleons: Language Models Can Learn to Hide Their Thoughts from Unseen Activation Monitors

Max McGuinness ⋅ Alex Serrano ⋅ Luke Bailey ⋅ Scott Emmons

Abstract

Log in and register to view live content