Skip to yearly menu bar Skip to main content


When Benchmarks Lie: Evaluating Malicious Prompt Classifiers Under True Distribution Shift

Max Fomin

Abstract

Chat is not available.