Skip to yearly menu bar Skip to main content


LLMs Gaming Verifiers: RLVR can Lead to Reward Hacking

Lukas Helff ⋅ Quentin Delfosse ⋅ David Steinmann ⋅ Ruben Härle ⋅ Hikaru Shindo ⋅ Patrick Schramowski ⋅ Wolfgang Stammer ⋅ Kristian Kersting ⋅ Felix Friedrich

Abstract

Chat is not available.