Poster
MGDA Converges under Generalized Smoothness, Provably
Qi Zhang · Peiyao Xiao · Shaofeng Zou · Kaiyi Ji
Hall 3 + Hall 2B #601
[
Abstract
]
Sat 26 Apr midnight PDT
— 2:30 a.m. PDT
Abstract:
Multi-objective optimization (MOO) is receiving more attention in various fields such as multi-task learning. Recent works provide some effective algorithms with theoretical analysis but they are limited by the standard LL-smooth or bounded-gradient assumptions, which typically do not hold for neural networks, such as Long short-term memory (LSTM) models and Transformers. In this paper, we study a more general and realistic class of generalized ℓℓ-smooth loss functions, where ℓℓ is a general non-decreasing function of gradient norm. We revisit and analyze the fundamental multiple gradient descent algorithm (MGDA) and its stochastic version with double sampling for solving the generalized ℓℓ-smooth MOO problems, which approximate the conflict-avoidant (CA) direction that maximizes the minimum improvement among objectives. We provide a comprehensive convergence analysis of these algorithms and show that they converge to an ϵϵ-accurate Pareto stationary point with a guaranteed ϵϵ-level average CA distance (i.e., the gap between the updating direction and the CA direction) over all iterations, where totally O(ϵ−2)O(ϵ−2) and O(ϵ−4)O(ϵ−4) samples are needed for deterministic and stochastic settings, respectively. We prove that they can also guarantee a tighter ϵϵ-level CA distance in each iteration using more samples. Moreover, we analyze an efficient variant of MGDA named MGDA-FA using only O(1)O(1) time and space, while achieving the same performance guarantee as MGDA.
Live content is unavailable. Log in and register to view live content