Poster
in
Workshop: Scientific Methods for Understanding Deep Learning (Sci4DL)

Representation Geometry Mediates Neural Circuit Formation: Evidence from Systematic Regularization Analysis

Hyunjun Kim

Project Page [ OpenReview]

Abstract

Neural circuits enabling in-context learning emerge during training through a phenomenon called *grokking*---delayed generalization long after memorization. While prior work has documented grokking empirically, the relationship between representation geometry and circuit formation remains poorly understood. We establish a theoretical framework connecting the condition number $\kappa$ of weight matrices to representation quality and training stability, providing both causal analysis and spectral characterization of orthogonality regularization. Through systematic comparison of regularization methods across 100 experiments (5 methods x 20 seeds each), we provide controlled evaluation of geometry-based regularization on grokking and representation quality. When hyperparameters are properly tuned, **all approaches achieve similar grokking timing** (median epoch ~100) and 100% success rates. However, methods differ substantially in the **quality of learned representations**: spectral-norm SRIP achieves near-perfect conditioning ($\kappa \approx 1.35$), while baseline training yields $\kappa \approx 72$ and SVB produces poorly conditioned representations ($\kappa > 335$)---a difference of over 53x between best and baseline. Mediation analysis across 125 experiments reveals a modest but statistically significant relationship between condition number and grokking (5--25% mediation effect, Sobel test $p = 0.065$), suggesting that geometry control provides partial rather than dominant influence on circuit formation. Lambda sensitivity experiments (27 runs) confirm that regularization strength systematically controls conditioning while grokking timing remains relatively stable. We extend these findings to language modeling on WikiText-2 (15 experiments, ~29M parameter transformer), where LinearSRIP achieves 44.3% lower final perplexity (747.4 vs 1341.4) and reduces training degradation by 2x (8.8x vs 17.6x best-to-final perplexity ratio), with 49.5% lower condition number ($\kappa = 48.6$ vs $96.2$). Our work demonstrates that geometry control is a general principle for representation quality, affecting both circuit formation in algorithmic tasks and training stability in language modeling. We recommend spectral-norm SRIP for applications requiring well-conditioned representations, such as interpretability research, downstream transfer, or robustness to input perturbations.

Chat is not available.