Entropy-Lens: Uncovering Decision Strategies in LLMs
Abstract
Transformer blocks iteratively refine next-token distributions, yet most interpretability tools analyze hidden states rather than token-space dynamics. We introduce Entropy-Lens, a model-agnostic method that tracks the entropy of logit-lens predictions across layers, yielding an entropy profile: a per-layer, permutation-invariant scalar summary of token prediction dynamic. Entropy differences between consecutive layers act as a proxy for two strategies: expansion (more candidates) and pruning (fewer candidates). Across model families and scales, entropy profiles show stable family-specific token prediction dynamics and exhibit depth-rescaling invariance. Finally, selectively skipping layers associated with maximal expansion or pruning shows that the two strategies have unequal functional importance for downstream multiple-choice accuracy, with expansion typically being more critical.