Skip to yearly menu bar Skip to main content


Attack Selection Reduces Safety in Concentrated AI Control Settings against Trusted Monitoring

Joachim Schaeffer ⋅ Arjun Khandelwal ⋅ Tyler Tracy

Abstract

Chat is not available.