High Probability Bounds for Non-Convex Stochastic Optimization with Momentum
Shaojie Li · Pengwei Tang · Bowei Zhu · Yong Liu
Abstract
Stochastic gradient descent with momentum (SGDM) is widely used in machine learning, yet high-probability learning bounds for SGDM in non-convex settings remain scarce. In this paper, we provide high-probability convergence bounds and generalization bounds for SGDM. First, we establish such bounds for the gradient norm in the general non-convex case. The resulting convergence bounds are tighter than existing theoretical results, and to the best of our knowledge, the obtained generalization bounds are the first ones for SGDM. Next, under the Polyak-{\L}ojasiewicz condition, we derive bounds for the function-value error instead of the gradient norm, and the corresponding learning rates are faster than in the general non-convex case. Finally, by additionally assuming a mild Bernstein condition on the gradient, we obtain even sharper generalization bounds whose learning rates can reach $\widetilde{\mathcal{O}}(1/n^2)$ in the low-noise regime, where $n$ is the sample size. Overall, we provide a systematic study of high-probability learning bounds for non-convex SGDM.
Successful Page Load