ICLR Poster Efficient Continual Finite-Sum Minimization

Poster

Efficient Continual Finite-Sum Minimization

Ioannis Mavrothalassitis · Stratis Skoulakis · Leello Dadi · Volkan Cevher

Halle B #122

[ Abstract ]

[ OpenReview]

Abstract: Given a sequence of functions

f_{1}, \dots, f_{n}

$f_1,\ldots,f_n$ with

f_{i} : D \mapsto R

$f_i:\mathcal{D}\mapsto \mathbb{R}$ , finite-sum minimization seeks a point

${x}^\star \in \mathcal{D}$ minimizing

$\sum_{j=1}^nf_j(x)/n$ . In this work, we propose a key twist into the finite-sum minimization, dubbed as *continual finite-sum minimization*, that asks for a sequence of points

$x_1^\star, \ldots, x_n^\star \in D$ such that each

${x}^\star_i \in D$ minimizes the prefix-sum

$\sum_{j=1}^if_j(x)/i$ . Assuming that each prefix-sum is strongly convex, we develop a first-order continual stochastic variance reduction gradient method (

$\mathrm{CSVRG}$ ) producing an

$\epsilon$ -optimal sequence with

$\tilde{\mathcal{O}}(n/\epsilon^{1/3} + 1/\sqrt{\epsilon})$ overall *first-order oracles* (FO). An FO corresponds to the computation of a single gradient

$\nabla f_j(x)$ at a given

$x \in \mathcal{D}$ for some

$j \in [n]$ . Our approach significantly improves upon the

$\mathcal{O}(n/\epsilon)$ FOs that

$\mathrm{StochasticGradientDescent}$ requires and the

$\mathcal{O}(n^2 \log (1/\epsilon))$ FOs that state-of-the-art variance reduction methods such as

$\mathrm{Katyusha}$ require. We also prove that there is no natural first-order method with

$\mathcal{O}\left(n/\epsilon^\alpha\right)$ gradient complexity for

$\alpha < 1/4$ , establishing that the first-order complexity of our method is nearly tight.

Chat is not available.