Skip to yearly menu bar Skip to main content


Poster
in
Workshop: 2nd Workshop on Mathematical and Empirical Understanding of Foundation Models

Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok

Tikeng Notsawo Pascal Junior · Hattie Zhou · Mohammad Pezeshki · Irina Rish · Guillaume Dumas


Abstract:

This paper presents a cost-effective method for predicting grokking in neural networks—delayed perfect generalization following overfitting or memorization. By analyzing the learning curve of the first few epochs, we show that certain oscillations forecast grokking in extended training. Our approach, using the Fourier transform's \emph{spectral signature}, efficiently detects these oscillations. Additional experiments explore their origins and characterize the loss landscape.

Chat is not available.