Abstract:
We study the role that a finite timescale separation parameter τ has on gradient descent-ascent in non-convex, non-concave zero-sum games where the learning rate of player 1 is denoted by γ1 and the learning rate of player 2 is defined to be γ2=τγ1. We provide a non-asymptotic construction of the finite timescale separation parameter τ∗ such that gradient descent-ascent locally converges to x∗ for all τ∈(τ∗,∞) if and only if it is a strict local minmax equilibrium. Moreover, we provide explicit local convergence rates given the finite timescale separation. The convergence results we present are complemented by a non-convergence result: given a critical point x∗ that is not a strict local minmax equilibrium, we present a non-asymptotic construction of a finite timescale separation τ0 such that gradient descent-ascent with timescale separation τ∈(τ0,∞) does not converge to x∗. Finally, we extend the results to gradient penalty regularization methods for generative adversarial networks and empirically demonstrate on CIFAR-10 and CelebA the significant impact timescale separation has on training performance.