ICLR 2023 Implicit regularization in Heavy-ball momentum accelerated stochastic gradient descent Oral

In-Person Oral presentation / top 25% paper

Implicit regularization in Heavy-ball momentum accelerated stochastic gradient descent

Avrajit Ghosh · HE LYU · Xitong Zhang · Rongrong Wang

Auditorium

[ Abstract ] [ Visit Oral 5 Track 5: Deep Learning and representational learning & Reinforcement Learning ]

Abstract: It is well known that the finite step-size (

h

$h$ ) in Gradient descent (GD) implicitly regularizes solutions to flatter minimas. A natural question to ask is \textit{Does the momentum parameter

β

$\beta$ (say) play a role in implicit regularization in Heavy-ball (H.B) momentum accelerated gradient descent (GD+M)?}. To answer this question, first, we show that the trajectory traced by discrete H.B momentum update (GD+M) is

O (h^{2})

$O(h^2)$ close to a continuous trajectory induced by a modified loss, which consists of an original loss and an implicit regularizer. This implicit regularizer for (GD+M) is indeed stronger than that of (GD) by factor of

(\frac{1 + β}{1 - β})

$(\frac{1+\beta}{1-\beta})$ , thus explaining why (GD+M) shows better generalization performance and higher test accuracy than (GD). Furthermore, we extend our analysis to stochastic version of gradient descent with momentum (SGD+M) and propose a deterministic continuous trajectory that is

O (h^{2})

$O(h^2)$ close to the discrete update of (SGD+M) in a strong approximation sense. We explore the implicit regularization in (SGD+M) and (GD+M) through a series of experiments validating our theory.

Chat is not available.