Poster
in
Workshop: 2nd Workshop on Mathematical and Empirical Understanding of Foundation Models
Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok
Tikeng Notsawo Pascal Junior · Hattie Zhou · Mohammad Pezeshki · Irina Rish · Guillaume Dumas
Abstract:
This paper presents a cost-effective method for predicting grokking in neural networks—delayed perfect generalization following overfitting or memorization. By analyzing the learning curve of the first few epochs, we show that certain oscillations forecast grokking in extended training. Our approach, using the Fourier transform's \emph{spectral signature}, efficiently detects these oscillations. Additional experiments explore their origins and characterize the loss landscape.
Chat is not available.