Skip to yearly menu bar Skip to main content


Poster

Learning Diverse Attacks on Large Language Models for Robust Red-Teaming and Safety Tuning

Seanie Lee ⋅ Minsu Kim ⋅ Lynn Cherif ⋅ David Dobre ⋅ Juho Lee ⋅ Sung Ju Hwang ⋅ Kenji Kawaguchi ⋅ Gauthier Gidel ⋅ Yoshua Bengio ⋅ Nikolay Malkin ⋅ Moksh Jain
2025 Poster

Abstract

Video

Chat is not available.