On the Convergence Direction of Gradient Descent
Abstract
Gradient descent (GD) is a fundamental optimization method in deep learning, yet its dynamics near the Edge of Stability (EoS) remain unclear despite empirical evidence showing GD often operates in this regime. In this paper, we prove that if GD converges, its trajectory either aligns toward a fixed direction or oscillates along a specific line. The fixed-direction convergence occurs under small learning rates, while the oscillatory convergence behavior emerges for large learning rate. This result offers a new lens for understanding the long-term GD dynamics. In particular, our result sheds light on the phenomenon of EoS, explaining why sharpness oscillates even as the loss converges. Experimentally, we find that this directional convergence behavior also appears in stochastic gradient descent (SGD) and Adam. These findings suggest a broader underlying principle governing the directional structure of optimization trajectories. Our work provides both theoretical clarity and practical insight into the behavior of dynamics for multiple optimization methods.