ICLR Poster Aggregated Momentum: Stability Through Passive Damping

Poster

Aggregated Momentum: Stability Through Passive Damping

James Lucas · Shengyang Sun · Richard Zemel · Roger Grosse

Great Hall BC #61

Keywords: [ optimization ] [ deep learning ] [ momentum ] [ neural networks ]

[ Abstract ]

Abstract:

Momentum is a simple and widely used trick which allows gradient-based optimizers to pick up speed along low curvature directions. Its performance depends crucially on a damping coefficient. Largecamping coefficients can potentially deliver much larger speedups, but are prone to oscillations and instability; hence one typically resorts to small values such as 0.5 or 0.9. We propose Aggregated Momentum (AggMo), a variant of momentum which combines multiple velocity vectors with different damping coefficients. AggMo is trivial to implement, but significantly dampens oscillations, enabling it to remain stable even for aggressive damping coefficients such as 0.999. We reinterpret Nesterov's accelerated gradient descent as a special case of AggMo and analyze rates of convergence for quadratic objectives. Empirically, we find that AggMo is a suitable drop-in replacement for other momentum methods, and frequently delivers faster convergence with little to no tuning.

Live content is unavailable. Log in and register to view live content