ICLR A Coefficient Makes SVRG Effective

Poster
in
Workshop: Bridging the Gap Between Practice and Theory in Deep Learning

A Coefficient Makes SVRG Effective

Yida Yin · Zhiqiu Xu · Zhiyuan Li · trevor darrell · Zhuang Liu

[ Abstract ] [ Project Page ]

[ Poster] [ OpenReview]

Abstract: Stochastic Variance Reduced Gradient (SVRG), introduced by Johnson & Zhang(2013), is a theoretically compelling optimization method. However, as Defazio& Bottou (2019) highlight, its effectiveness in deep learning is yet to be proven. Inthis work, we demonstrate the potential of SVRG in optimizing real-world neuralnetworks. Our analysis finds that, for deeper networks, the strength of the variancereduction term in SVRG should be smaller and decrease as training progresses.Inspired by this, we introduce a multiplicative coefficient α to control the strengthand adjust it through a linear decay schedule. We name our method

α

$\alpha$ -SVRG.Our results show

α

$\alpha$ -SVRG better optimizes neural networks, consistently reducingtraining loss compared to both baseline and the standard SVRG across variousarchitectures and image classification datasets. We hope our findings encouragefurther exploration into variance reduction techniques in deep learning. Code is available at the anonymous GitHub repository https://github.com/abc-092/alpha-SVRG.

Chat is not available.

Poster in Workshop: Bridging the Gap Between Practice and Theory in Deep Learning

A Coefficient Makes SVRG Effective

Yida Yin · Zhiqiu Xu · Zhiyuan Li · trevor darrell · Zhuang Liu

Poster
in
Workshop: Bridging the Gap Between Practice and Theory in Deep Learning