Skip to yearly menu bar Skip to main content


Large Language Models Generate Harmful Content Using a Unified Mechanism

Hadas Orgad ⋅ Boyi Wei ⋅ Kaden Zheng ⋅ Martin Wattenberg ⋅ Peter Henderson ⋅ Seraphina Goldfarb-Tarrant ⋅ Yonatan Belinkov

Abstract

Chat is not available.