Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Bridging the Gap Between Practice and Theory in Deep Learning

Stochastic restarting to overcome overfitting in neural networks

Yeongwoo Song · Youngkyoung Bae · Hawoong Jeong


Abstract:

At times, you may feel like giving up on your neural networks and starting all over. But is restarting truly beneficial in the training process?In this paper, we propose the Stochastic Restarting at checkpoint (Sto-Re) algorithm, designed to overcome overfitting scenarios of neural networks by restarting all parameters from a checkpoint. We map the dynamics of stochastic gradient descent (SGD) to the Langevin dynamics and introduce the stochastic restarting strategy which has been actively studied in the field of statistical physics for finding a target faster. Our theoretical analysis shows that incorporating Sto-Re algorithm into SGD can find optimization parameters more efficiently compared with an ordinary SGD. Furthermore, we demonstrate that the Sto-Re algorithm is particularly advantageous when the stochasticity of SGD increases. Our results provide evidence of its ability to improve performance and generalization across a range of network architectures, datasets, and optimizers in overfitting cases. An important aspect of the Sto-Re algorithm is its ease of implementation, making it accessible for practical use. We envision it as a valuable tool that can be employed alongside other training protocols. Ultimately, our findings suggest that the Sto-Re algorithm holds significant potential for enhancing the training process of neural networks.

Chat is not available.