Abstract:
The classical Perceptron algorithm of Rosenblatt can be used to find a linear threshold function to correctly classify nn linearly separable data points, assuming the classes are separated by some margin γ>0γ>0. A foundational result is that Perceptron converges after Ω(1/γ2)Ω(1/γ2) iterations. There have been several recent works that managed to improve this rate by a quadratic factor, to Ω(√logn/γ)Ω(√logn/γ), with more sophisticated algorithms. In this paper, we unify these existing results under one framework by showing that they can all be described through the lens of solving min-max problems using modern acceleration techniques, mainly through \emph{optimistic} online learning. We then show that the proposed framework also leads to improved results for a series of problems beyond the standard Perceptron setting. Specifically, a) for the margin maximization problem, we improve the state-of-the-art result from O(logt/t2)O(logt/t2) to O(1/t2)O(1/t2), where tt is the number of iterations; b) we provide the first result on identifying the implicit bias property of the classical Nesterov's accelerated gradient descent (NAG) algorithm, and show NAG can maximize the margin with an O(1/t2)O(1/t2) rate; c) for the classical pp-norm Perceptron problem, we provide an algorithm with Ω(√(p−1)logn/γ)Ω(√(p−1)logn/γ) convergence rate, while existing algorithms suffer the Ω((p−1)/γ2)Ω((p−1)/γ2) convergence rate.