LAYER-DEPENDENT STRUCTURE IN GRADIENT NOISE OF SMALL CONVOLUTIONAL NETWORKS
Abstract
Despite the remarkable success of deep learning, many aspects of training dynamics remain poorly understood. In particular, it is unclear whether the stochastic gradient updates produced by different random initializations exhibit any reproducible structure. In this work, we conduct a systematic empirical study of small convolutional neural networks (CNNs) trained on standard vision datasets to explore whether patterns in gradient noise are consistent across independent training runs. We track per-layer gradient norms, directions, and correlations over multiple random seeds, and observe stable, layer-dependent trends in gradient behavior across runs. In particular, early layers consistently exhibit higher directional alignment of gradient updates than deeper layers, while later layers display increased variability. These patterns persist across architectures, datasets, and optimization settings, suggesting that gradient noise may contain structured components beyond purely random fluctuations. Rather than aiming to establish definitive laws, this study provides an exploratory experimental framework for probing stochastic gradient dynamics and highlights empirical regularities that may inform future theoretical and experimental investigations of deep learning.