Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Bridging the Gap Between Practice and Theory in Deep Learning

Measuring Sharpness in Grokking

Jack Miller · Patrick Gleeson · Noam Levi · Charles O'Neill · Thang Bui


Abstract:

Neural networks sometimes undergo grokking, where they achieve perfect or near-perfect performance on a validation set well after the same performance has been achieved on the corresponding training set. It is also typically observed that the change from poor to good validation performance seems to be quite sharp. In this workshop paper, we introduce a robust technique for measuring grokking, based on fitting an appropriate functional form. We use this to investigate sharpness in two settings. The first setting is the theoretical framework developed by Levi et al. (2023) where closed form expressions are readily accessible. The second setting is a two-layer MLP trained to predict the parity of bits, with grokking induced by the concealment strategy of Miller et al. (2023). We found that the trends between the relative grokking gap and the grokking sharpness were similar in both settings under absolute and relative sharpness measures. This may indicate that there is some shared mechanism in these cases which is not explained under current theory.

Chat is not available.