ICLR Poster A NON-PARAMETRIC REGRESSION VIEWPOINT : GENERALIZATION OF OVERPARAMETRIZED DEEP RELU NETWORK UNDER NOISY OBSERVATIONS

Poster

A NON-PARAMETRIC REGRESSION VIEWPOINT : GENERALIZATION OF OVERPARAMETRIZED DEEP RELU NETWORK UNDER NOISY OBSERVATIONS

Namjoon Suh · Hyunouk Ko · Xiaoming Huo

Keywords: [ neural tangent kernel ] [ minimax ]

[ Abstract ]

[ Visit Poster at Spot A0 in Virtual World ] [ OpenReview]

Abstract: We study the generalization properties of the overparameterized deep neural network (DNN) with Rectified Linear Unit (ReLU) activations.Under the non-parametric regression framework, it is assumed that the ground-truth function is from a reproducing kernel Hilbert space (RKHS) induced by a neural tangent kernel (NTK) of ReLU DNN, and a dataset is given with the noises. Without a delicate adoption of early stopping, we prove that the overparametrized DNN trained by vanilla gradient descent does not recover the ground-truth function. It turns out that the estimated DNN's

$L_{2}$ prediction error is bounded away from

$0$ . As a complement of the above result, we show that the

$\ell_{2}$ -regularized gradient descent enables the overparametrized DNN achieve the minimax optimal convergence rate of the

$L_{2}$ prediction error, without early stopping. Notably, the rate we obtained is faster than

$\mathcal{O}(n^{-1/2})$ known in the literature.

Chat is not available.