Skip to yearly menu bar Skip to main content


Poster

Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols

Mikhail Terekhov · Alexander Panfilov · Daniil Dzenhaliou · Caglar Gulcehre · Maksym Andriushchenko · Ameya Prabhu · Jonas Geiping

Abstract

Log in and register to view live content