Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Workshop on Distributed and Private Machine Learning

Talk Less, Smile More: Reducing Communication with Distributed Auto-Differentiation

Bradley Baker · Vince Calhoun · Barak Pearlmutter · Sergey Plis


Abstract:

The gradient has long been the most common shared statistic for distributed machine learning; however, distributed deep neural networks (DNNs) tend to be large, so transmitting gradients can consume considerable bandwidth. Methods such as sparsification and quantization have emerged in attempts to reduce this, but the focus remains on compressing gradients, rather than sharing some other value. Here, we present an unexplored shift away from gradients towards a statistic which is more communication-friendly than the gradient, yet still grounded in mathematically correct optimization. The process, inspired by auto-differentiation, also provides unique insights into how gradients are composed via the outer-product. This insight can be further exploited to obtain a low-rank approximation of the gradients, which further reduces communication, while providing a better approximation of the gradient than other low-rank compression methods.

Chat is not available.