Skip to yearly menu bar Skip to main content


Single-pass detection of jailbreaking input in large language models

Leyla Naz Candogan ⋅ Yongtao Wu ⋅ Elias Abad Rocamora ⋅ Grigorios Chrysos ⋅ Volkan Cevher

Abstract

Chat is not available.