In-context learning of representations can be explained by induction circuits
Abstract
Park et al., 2025 demonstrate that large language models can learn to trace random walks on graphs presented in context, and observe that token representations reorganize to reflect the underlying graph structure. This has been interpreted as evidence that models 'flexibly manipulate their representations' to reflect in-context semantics, and that this reorganization enables task performance. We offer a simpler mechanistic explanation. We first observe that task performance can be fully explained by induction circuits (Olsson et al., 2022), and show that ablating the attention heads that comprise these circuits substantially degrades performance. As for the geometric structure, we propose that it could result from previous token heads effectively mixing the representations of graph neighbors together. We show that a single round of such 'neighbor mixing' on random embeddings recreates the observed graph correspondence in PCA visualizations. These results suggest that apparent 'representation reorganization' may be a byproduct of the model's induction circuits, rather than a critical strategy useful for in-context learning.