Skip to yearly menu bar Skip to main content


Disentangling goal and framing for detecting LLM jailbreaks

Amirhossein Farzam ⋅ Majid Behbahani ⋅ Mani Malek ⋅ Yuriy Nevmyvaka ⋅ Guillermo Sapiro

Abstract

Log in and register to view live content