In-Person Poster presentation / poster accept
How gradient estimator variance and bias impact learning in neural networks
Arna Ghosh · Yuhan Helena Liu · Guillaume Lajoie · Konrad P Kording · Blake A Richards
MH1-2-3-4 #133
Keywords: [ credit assignment ] [ learning and plasticity ] [ Gradient approximation ] [ computational neuroscience ] [ Neuromorphic computing ] [ Imperfect gradient descent ] [ Biologically-plausible learning ] [ neural networks ] [ Neuroscience and Cognitive Science ]
There is growing interest in understanding how real brains may approximate gradients and how gradients can be used to train neuromorphic chips. However, neither real brains nor neuromorphic chips can perfectly follow the loss gradient, so parameter updates would necessarily use gradient estimators that have some variance and/or bias. Therefore, there is a need to understand better how variance and bias in gradient estimators impact learning dependent on network and task properties. Here, we show that variance and bias can impair learning on the training data, but some degree of variance and bias in a gradient estimator can be beneficial for generalization. We find that the ideal amount of variance and bias in a gradient estimator are dependent on several properties of the network and task: the size and activity sparsity of the network, the norm of the gradient, and the curvature of the loss landscape. As such, whether considering biologically-plausible learning algorithms or algorithms for training neuromorphic chips, researchers can analyze these properties to determine whether their approximation to gradient descent will be effective for learning given their network and task properties.