Skip to yearly menu bar Skip to main content


In-Person Poster presentation / poster accept

Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 Small

Kevin Wang ⋅ Alexandre Variengien ⋅ Arthur Conmy ⋅ Buck Shlegeris ⋅ Jacob Steinhardt
2023 In-Person Poster presentation / poster accept

Abstract

Video

Chat is not available.