Skip to yearly menu bar Skip to main content


Shadow Alignment: The Ease of Subverting Safely-Aligned Language Models

Xianjun Yang ⋅ Xiao Wang ⋅ Qi Zhang ⋅ Linda Petzold ⋅ William Wang ⋅ XUN ZHAO ⋅ Dahua Lin

Abstract

Chat is not available.