Poster
in
Workshop: Bridging the Gap Between Practice and Theory in Deep Learning
Simulating the Implicit Effect of Learning Rates in Gradient Descent
Adrian Goldwaser · Bruno Mlodozeniec · Hong Ge
Abstract:
Training neural networks can be very sensitive to hyperparameter choice, in particular the learning rate. This choice can often have large effects on final generalisation, not just on training speed and convergence. These two impacts of the learning rate choice are entangled resulting in difficulty finding the true cause of phenomena in deep learning. Building on previous theoretical work, we empirically show how to disentangle these by simulating one learning rate with a smaller one such that the performance remains constant. We show where this method works and what can cause it to break down. We apply this method to the problem of learning rate decay to better understand its effect.
Chat is not available.