Skip to yearly menu bar Skip to main content


Poster

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal

Tinghao Xie · Xiangyu Qi · Yi Zeng · Yangsibo Huang · Udari Sehwag · Kaixuan Huang · Luxi He · Boyi Wei · Dacheng Li · Ying Sheng · Ruoxi Jia · Bo Li · Kai Li · Danqi Chen · Peter Henderson · Prateek Mittal
2025 Poster

Abstract

Video

Chat is not available.