Generalization of Two-layer Neural Networks: An Asymptotic Viewpoint

Jimmy Ba; Murat Erdogdu; Taiji Suzuki; Denny Wu; Tianzong Zhang

Generalization of Two-layer Neural Networks: An Asymptotic Viewpoint

Jimmy Ba, Murat Erdogdu, Taiji Suzuki, Denny Wu, Tianzong Zhang

Keywords: generalization, inductive bias, regression

Abstract Paper Reviews

Monday: Optimisation Theory

Abstract: This paper investigates the generalization properties of two-layer neural networks in high-dimensions, i.e. when the number of samples $n$, features $d$, and neurons $h$ tend to infinity at the same rate. Specifically, we derive the exact population risk of the unregularized least squares regression problem with two-layer neural networks when either the first or the second layer is trained using a gradient flow under different initialization setups. When only the second layer coefficients are optimized, we recover the \textit{double descent} phenomenon: a cusp in the population risk appears at $h\approx n$ and further overparameterization decreases the risk. In contrast, when the first layer weights are optimized, we highlight how different scales of initialization lead to different inductive bias, and show that the resulting risk is \textit{independent} of overparameterization. Our theoretical and experimental results suggest that previously studied model setups that provably give rise to \textit{double descent} might not translate to optimizing two-layer neural networks.

Generalization of Two-layer Neural Networks: An Asymptotic Viewpoint

Jimmy Ba, Murat Erdogdu, Taiji Suzuki, Denny Wu, Tianzong Zhang

Similar Papers

Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks

Ziwei Ji, Matus Telgarsky,

Span Recovery for Deep Neural Networks with Applications to Input Obfuscation

Rajesh Jayaram, David P. Woodruff, Qiuyi Zhang,

Sample Efficient Policy Gradient Methods with Recursive Variance Reduction

Pan Xu, Felicia Gao, Quanquan Gu,