ICLR Simulating the Implicit Effect of Learning Rates in Gradient Descent

Poster
in
Workshop: Bridging the Gap Between Practice and Theory in Deep Learning

Simulating the Implicit Effect of Learning Rates in Gradient Descent

Adrian Goldwaser · Bruno Mlodozeniec · Hong Ge

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Training neural networks can be very sensitive to hyperparameter choice, in particular the learning rate. This choice can often have large effects on final generalisation, not just on training speed and convergence. These two impacts of the learning rate choice are entangled resulting in difficulty finding the true cause of phenomena in deep learning. Building on previous theoretical work, we empirically show how to disentangle these by simulating one learning rate with a smaller one such that the performance remains constant. We show where this method works and what can cause it to break down. We apply this method to the problem of learning rate decay to better understand its effect.

Chat is not available.

Poster in Workshop: Bridging the Gap Between Practice and Theory in Deep Learning

Simulating the Implicit Effect of Learning Rates in Gradient Descent

Adrian Goldwaser · Bruno Mlodozeniec · Hong Ge

Poster
in
Workshop: Bridging the Gap Between Practice and Theory in Deep Learning