Skip to yearly menu bar Skip to main content


LLM Hypnosis: Characterizing the Fragility of RLHF Against Unprivileged Knowledge Injection

Almog Hilel ⋅ Riddhi Bhagwat ⋅ Leshem Choshen ⋅ Idan Shenfeld ⋅ Jacob Andreas

Abstract

Chat is not available.