Skip to yearly menu bar Skip to main content


Spotlight Poster

Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game

Sam Toyer · Olivia Watkins · Ethan Mendes · Justin Svegliato · Luke Bailey · Tiffany Wang · Isaac Ong · Karim Elmaaroufi · Pieter Abbeel · trevor darrell · Alan Ritter · Stuart Russell
2024 Spotlight Poster

Abstract

Video

Chat is not available.