ICLR Poster Implicit bias of SGD in $L_2$-regularized linear DNNs: One-way jumps from high to low rank

Spotlight Poster

Implicit bias of SGD in $L_2$ -regularized linear DNNs: One-way jumps from high to low rank

Zihan Wang · Arthur Jacot

Halle B #152

[ Abstract ]

[ Slides] [ Poster] [ OpenReview]

Abstract: The

L_{2}

$L_{2}$ -regularized loss of Deep Linear Networks (DLNs) withmore than one hidden layers has multiple local minima, correspondingto matrices with different ranks. In tasks such as matrix completion,the goal is to converge to the local minimum with the smallest rankthat still fits the training data. While rank-underestimating minimacan be avoided since they do not fit the data, GD might getstuck at rank-overestimating minima. We show that with SGD, there is always a probability to jumpfrom a higher rank minimum to a lower rank one, but the probabilityof jumping back is zero. More precisely, we define a sequence of sets

B_{1} \subset B_{2} \subset \dots \subset B_{R}

$B_{1}\subset B_{2}\subset\cdots\subset B_{R}$ so that

B_{r}

$B_{r}$ contains all minima of rank

r

$r$ or less (and not more) that are absorbingfor small enough ridge parameters

λ

$\lambda$ and learning rates

η

$\eta$ :SGD has prob. 0 of leaving

B_{r}

$B_{r}$ , and from any starting point thereis a non-zero prob. for SGD to go in

B_{r}

$B_{r}$ .

Chat is not available.

Spotlight Poster

Implicit bias of SGD in L2L_2-regularized linear DNNs: One-way jumps from high to low rank

Zihan Wang · Arthur Jacot

Halle B #152

Implicit bias of SGD in $L_2$ -regularized linear DNNs: One-way jumps from high to low rank