The Geometry of Spectral Gradient Descent: Layerwise Criteria for SignSGD vs SpecSGD
Laura Gomezjurado Gonzalez ⋅ Mahdi Ghaznavi ⋅ Hiroki Naganuma ⋅ Ioannis Mitliagkas
Abstract
Optimization in deep learning has expanded beyond Euclidean methods to include entrywise sign updates (SignSGD) and spectral sign updates (SpecGD/Muon). While both can be viewed as steepest descent under non-Euclidean geometries ($l_\infty$ and spectral norms, respectively), existing theory analyzes them in isolation. This leaves a practical gap: for a specific layer with finite batch size, which geometry yields better optimization progress? We address this by deriving a unified bound on the expected one-step loss improvement that accounts for both the local geometry of the gradient signal and the geometry of the stochastic noise. We introduce the \emph{Spec-Sign Advantage Index}, a layerwise scalar criterion that determines the optimal optimizer choice. Our analysis reveals that SpecGD is preferred not simply when gradients are low-rank, but when the nuclear-norm signal-to-noise ratio exceeds the $l_1$-norm signal-to-noise ratio, generalizing recent deterministic condition checks to the stochastic regime.
Successful Page Load