ICLR Poster Agnostic Learning of General ReLU Activation Using Gradient Descent

Virtual presentation / poster accept

Agnostic Learning of General ReLU Activation Using Gradient Descent

Pranjal Awasthi · Alex Tang · Aravindan Vijayaraghavan

Keywords: [ learning theory ] [ global convergence ] [ learning ReLU ] [ agnostic learning ] [ Theory ]

[ Abstract ]

[ Poster] [ OpenReview]

Abstract: We provide a convergence analysis of gradient descent for the problem of agnostically learning a single ReLU function under Gaussian distributions. Unlike prior work that studies the setting of zero bias, we consider the more challenging scenario when the bias of the ReLU function is non-zero. Our main result establishes that starting from random initialization, in a polynomial number of iterations gradient descent outputs, with high probability, a ReLU function that achieves an error that is within a constant factor of the optimal i.e., it is guaranteed to achieve an error of

O (O P T)

$O(OPT)$ , where

O P T

$OPT$ is the error of the best ReLU function. This is a significant improvement over existing guarantees for gradient descent, which only guarantee error of

O (\sqrt{d \cdot O P T})

$O(\sqrt{d \cdot OPT})$ even in the zero-bias case (Frei et al., 2020). We also provide finite sample guarantees, and obtain similar guarantees for a broader class of marginal distributions beyond Gaussians.

Chat is not available.