Skip to yearly menu bar Skip to main content


Implicit Bias of MSE Gradient Optimization in Underparameterized Neural Networks

Benjamin Bowman · Guido Montufar

Keywords: [ gradient flow ] [ implicit bias ] [ neural tangent kernel ] [ implicit regularization ]

Abstract: We study the dynamics of a neural network in function space when optimizing the mean squared error via gradient flow. We show that in the underparameterized regime the network learns eigenfunctions of an integral operator $T_K$ determined by the Neural Tangent Kernel at rates corresponding to their eigenvalues. For example, for uniformly distributed data on the sphere $S^{d - 1}$ and rotation invariant weight distributions, the eigenfunctions of $T_K$ are the spherical harmonics. Our results can be understood as describing a spectral bias in the underparameterized regime. The proofs use the concept of ``Damped Deviations'' where deviations of the NTK matter less for eigendirections with large eigenvalues. Aside from the underparameterized regime, the damped deviations point-of-view allows us to extend certain results in the literature in the overparameterized setting.

Chat is not available.