Skip to yearly menu bar Skip to main content


A Generative Approach to LLM Harmfulness Detection with Red Flag Tokens

Sophie Xhonneux ⋅ David Dobre ⋅ Mehrnaz Mofakhami ⋅ Leo Schwinn ⋅ Gauthier Gidel

Abstract

Chat is not available.