Skip to yearly menu bar Skip to main content


Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

Boyi Wei ⋅ Kaixuan Huang ⋅ Yangsibo Huang ⋅ Tinghao Xie ⋅ Xiangyu Qi ⋅ Mengzhou Xia ⋅ Prateek Mittal ⋅ Mengdi Wang ⋅ Peter Henderson

Abstract

Chat is not available.