In-Person Poster presentation / poster accept

Scaling Forward Gradient With Local Losses

Mengye Ren · Simon Kornblith · Renjie Liao · Geoffrey E Hinton

MH1-2-3-4 #42

Keywords: [ Deep Learning and representational learning ]

[ Abstract ]
[ OpenReview
Tue 2 May 2:30 a.m. PDT — 4:30 a.m. PDT


Forward gradient learning computes a noisy directional gradient and is a biologically plausible alternative to backprop for learning deep neural networks. The standard forward gradient algorithm suffers from the curse of dimensionality in the number of parameters. In this paper, we propose to scale forward gradient by adding a large number of local greedy loss functions. We consider block-wise, patch-wise, and channel group-wise local losses, and show that activity perturbation reduces variance compared to weight perturbation. Inspired by MLPMixer, we also propose a new architecture, LocalMixer, that is more suitable for local learning. We find local learning can work well with both supervised classification and self-supervised contrastive learning. Empirically, it can match backprop on MNIST and CIFAR-10 and significantly outperform backprop-free algorithms on ImageNet.

Chat is not available.