Skip to yearly menu bar Skip to main content


When Lie Detectors Learn Model Identity: Confounds in Black-Box Sandbagging Detection

Lin Yulong ⋅ Pablo Bernabeu-Perez ⋅ Benjamin Arnav ⋅ Lennie Wells ⋅ Mary Phuong

Abstract

Chat is not available.