Gradual Stochastic Gradient Descent: from signSGD to SGD via $\ell_p$ Norm
Jinghui Yuan ⋅ Jiachen Liu ⋅ Feiping Nie
Abstract
The research community has long sought an optimizer that converges as quickly as Adam in the early stage while achieving the strong generalization of SGD in the later stage. In this paper, we present a novel and feasible approach toward this goal. Recent studies have shown that Adam can be viewed as a smoothed version of sign Stochastic Gradient Descent (signSGD), i.e., the steepest descent under an $\ell_\infty$ norm ball constraint, whereas stochastic gradient descent can be regarded as the steepest descent under an $\ell_2$ norm ball. Inspired by this perspective, we propose Gradual Norm Optimization framework and design Gradual Stochastic Gradient Descent algorithm (GSGD), which enables the optimizer to smoothly transition from sign-based stochastic gradient descent in the early phase to standard stochastic gradient descent at the end. Gradual Stochastic Gradient Descent requires modifying only a single line of the original SGD implementation. We conduct preliminary evaluations of GSGD on Cifar-10 datasets, and the experimental results show that it exhibits fast convergence comparable to Adam and signSGD in the early stage, while retaining the generalization performance of SGD in the later stage.
Chat is not available.
Successful Page Load