Poster
in
Workshop: Bridging the Gap Between Practice and Theory in Deep Learning
Momentum Gradient Descent over single-neuron linear network: rich behaviors of limiting Sharpness
WenJie Zhou · Bohan Wang · Wei Chen · Zhi-Ming Ma · Xueqi Cheng
Abstract:
In this work, we explore the training dynamics of momentum gradient descent, specifically Polyak's Heavy Ball Method (PHB), as applied to the single-neuron linear network model, a framework renowned for demonstrating phenomena such as Progressive Sharpening and the Edge of Stability when trained with Gradient Descent (GD). In contrast to GD, PHB exhibits a diverse array of behaviors concerning sharpness in this context. We find that with a fixed learning rate, $\eta$, and momentum coefficient, $\beta$, the resulting limiting sharpness from various initializations shows considerable variance. This finding is in stark contrast to the limiting sharpness observed in classification tasks on CIFAR10-5k, which generally stabilizes around the \textit{Maximum Stable Sharpness} (MSS) $\frac{2(1+\beta)}{\eta}$. Moreover, our experiments reveal a significantly lower median limiting sharpness than the MSS. We attribute the high variance and reduced median in limiting sharpness to the presence of multiple distinct values that the sharpness can assume, dependent on the initialization. We also identify a correlation between the achieved limiting sharpness value and the length of the bouncing period of PHB. Through this investigation, we offer theoretical insights into the trajectory behaviors associated with different bouncing periods. Our findings contribute to a more nuanced understanding of the dynamics of PHB at the Edge of Stability.
Chat is not available.