Skip to yearly menu bar Skip to main content


GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models

Haibo Jin ⋅ Ruoxi Chen ⋅ Andy Zhou ⋅ Yang Zhang ⋅ Haohan Wang

Abstract

Chat is not available.