Poster
in
Workshop: Bridging the Gap Between Practice and Theory in Deep Learning
Bridging Lottery ticket and Grokking: Is Weight Norm Sufficient to Explain Delayed Generalization?
Gouki Minegishi · Yusuke Iwasawa · Yutaka Matsuo
Grokking is the intriguing phenomenon of delayed generalization: initially, a network achieves a memorization solution with perfect training accuracy and limited generalization solution; however, through further training, it eventually attains a generalization solution. This paper counters previous notions that weight norm reduction explains grokking, by demonstrating through experiments that the identification of optimal subnetworks plays a crucial role in achieving generalization. It leverages the lottery ticket hypothesis to argue that finding these `lottery tickets' is key to transitioning from memorization to generalization. Our research presents empirical evidence, showing that (1) with the proper subnetworks, the delayed generalization does not occur, (2) with the similar weight norm, the dense networks still require substantially longer training to achieve full generalization, (3) with only structure optimization (without updating the value of weights), we can convert the memorization solution to the generalization solution. These results emphasize the importance of subnetwork identification over traditional weight norm reduction theories in explaining grokking's delayed generalization phenomenon.