Skip to yearly menu bar Skip to main content


Poster

SGD with memory: fundamental properties and stochastic acceleration

Dmitry Yarotsky · Maksim Velikanov

Hall 3 + Hall 2B #366
[ ]
Fri 25 Apr 7 p.m. PDT — 9:30 p.m. PDT

Abstract: An important open problem is the theoretically feasible acceleration of mini-batch SGD-type algorithms on quadratic problems with power-law spectrum. In the non-stochastic setting, the optimal exponent ξξ in the loss convergence LtCLtξLtCLtξ is double that in plain GD and is achievable using Heavy Ball (HB) with a suitable schedule; this no longer works in the presence of mini-batch noise. We address this challenge by considering first-order methods with an arbitrary fixed number MM of auxiliary velocity vectors (*memory-MM algorithms*). We first prove an equivalence between two forms of such algorithms and describe them in terms of suitable characteristic polynomials. Then we develop a general expansion of the loss in terms of *signal and noise propagators*. Using it, we show that losses of stationary stable memory-MM algorithms always retain the exponent ξξ of plain GD, but can have different constants CLCL depending on their *effective learning rate* that generalizes that of HB. We prove that in memory-1 algorithms we can make CLCL arbitrarily small while maintaining stability. As a consequence, we propose a memory-1 algorithm with a time-dependent schedule that we show heuristically and experimentally to improve the exponent ξξ of plain SGD.

Live content is unavailable. Log in and register to view live content