Skip to yearly menu bar Skip to main content


Shadow Alignment: The Ease of Subverting Safely-Aligned Language Models

Xianjun Yang · Xiao Wang · Qi Zhang · Linda Petzold · William Wang · XUN ZHAO · Dahua Lin

Abstract

Chat is not available.