Bounds on Over-Parameterization for Guaranteed Existence of Descent Paths in Shallow ReLU Networks

Arsalan Sharifnassab; Saber Salehkaleybar; S. Jamaloddin Golestani

Abstract: We study the landscape of squared loss in neural networks with one-hidden layer and ReLU activation functions. Let $m$ and $d$ be the widths of hidden and input layers, respectively. We show that there exist poor local minima with positive curvature for some training sets of size $n\geq m+2d-2$. By positive curvature of a local minimum, we mean that within a small neighborhood the loss function is strictly increasing in all directions. Consequently, for such training sets, there are initialization of weights from which there is no descent path to global optima. It is known that for $n\le m$, there always exist descent paths to global optima from all initial weights. In this perspective, our results provide a somewhat sharp characterization of the over-parameterization required for "existence of descent paths" in the loss landscape.

Bounds on Over-Parameterization for Guaranteed Existence of Descent Paths in Shallow ReLU Networks

Arsalan Sharifnassab, Saber Salehkaleybar, S. Jamaloddin Golestani

Similar Papers

On the Global Convergence of Training Deep Linear ResNets

Difan Zou, Philip M. Long, Quanquan Gu,

Universal Approximation with Certified Networks

Maximilian Baader, Matthew Mirman, Martin Vechev,

Span Recovery for Deep Neural Networks with Applications to Input Obfuscation

Rajesh Jayaram, David P. Woodruff, Qiuyi Zhang,