ICLR Poster Local Convergence Analysis of Gradient Descent Ascent with Finite Timescale Separation

Poster

Local Convergence Analysis of Gradient Descent Ascent with Finite Timescale Separation

Tanner Fiez · Lillian J Ratliff

Keywords: [ generative adversarial networks ] [ theory ] [ game theory ] [ convergence ] [ continuous games ] [ gradient descent-ascent ] [ equilibrium ]

[ Abstract ] [ Paper PDF ]

[ Paper ]

Abstract: We study the role that a finite timescale separation parameter

τ

$\tau$ has on gradient descent-ascent in non-convex, non-concave zero-sum games where the learning rate of player 1 is denoted by

γ_{1}

$\gamma_1$ and the learning rate of player 2 is defined to be

γ_{2} = τ γ_{1}

$\gamma_2=\tau\gamma_1$ . We provide a non-asymptotic construction of the finite timescale separation parameter

τ^{*}

$\tau^{\ast}$ such that gradient descent-ascent locally converges to

x^{*}

$x^{\ast}$ for all

τ \in (τ^{*}, \infty)

$\tau \in (\tau^{\ast}, \infty)$ if and only if it is a strict local minmax equilibrium. Moreover, we provide explicit local convergence rates given the finite timescale separation. The convergence results we present are complemented by a non-convergence result: given a critical point

x^{*}

$x^{\ast}$ that is not a strict local minmax equilibrium, we present a non-asymptotic construction of a finite timescale separation

τ_{0}

$\tau_{0}$ such that gradient descent-ascent with timescale separation

τ \in (τ_{0}, \infty)

$\tau\in (\tau_0, \infty)$ does not converge to

x^{*}

$x^{\ast}$ . Finally, we extend the results to gradient penalty regularization methods for generative adversarial networks and empirically demonstrate on CIFAR-10 and CelebA the significant impact timescale separation has on training performance.

Chat is not available.