Skip to yearly menu bar Skip to main content


Better Attacks for Better Monitors: Semi-Automated Red-Teaming for Agent Monitoring

Monika Jotautaitė ⋅ Angel Martinez ⋅ Tyler Tracy ⋅ Ollie Matthews

Abstract

Chat is not available.