Skip to yearly menu bar Skip to main content


Exploration Hacking: LLMs Can Learn to Resist RL Training

Eyon Jang ⋅ Damon Falck ⋅ Joschka Braun ⋅ Nathalie Kirch ⋅ Achyutha Menon ⋅ Perusha Moodley ⋅ Scott Emmons ⋅ Roland Zimmermann ⋅ David Lindner

Abstract

Chat is not available.