Poster
in
Workshop: Geometry-grounded Representation Learning and Generative Modeling

Single-Token Features as Extreme Monosemanticity: Compositional Transition and Geometric Signatures in SAE Feature Space

Seonglae Cho ⋅ Kleyton da Costa ⋅ Zekun Wu ⋅ Ilham Wicaksono ⋅ Rishi Kalra ⋅ Adriano Koshiyama

Project Page [ OpenReview]

Abstract

Sparse Autoencoders (SAEs) decompose neural network activations into interpretable features, but the structure of this feature space remains poorly understood. We study single-token features, the extreme endpoint of monosemanticity where a feature activates on one vocabulary item and its variants. Analyzing 3.5 million features across five models, we find these features concentrate in Layer 0, drop sharply with scale within residual-stream SAEs, and show distinctive activation and decoder-geometry signatures (high gap and purity; tighter decoder-space clustering). Decoder tracking reveals a sharp early-layer transition in GPT2-Small consistent with a boundary between token identity and compositional processing. At comparable scale, prevalence differs dramatically across SAE objectives: JumpReLU GemmaScope yields ~46× higher Layer 0 single-token prevalence than TopK LlamaScope. These results emphasize methodology matching as a key requirement for cross-SAE comparisons.

Chat is not available.