ICLR Breaking the Dimension Dependence in Sketching for Distributed Learning

Poster
in
Workshop: Bridging the Gap Between Practice and Theory in Deep Learning

Breaking the Dimension Dependence in Sketching for Distributed Learning

Berivan Isik · Qiaobo Li · Mayank Shrivastava · Arindam Banerjee · Sanmi Koyejo

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract: The high communication cost between the server and the clients is a significant bottleneck in scaling distributed learning for modern overparameterized deep models. One popular approach to reduce this cost is linear sketching, where the sender projects the updates into a lower dimension before communication, and the receiver desketches before any subsequent computation. While sketched distributed learning is known to scale effectively in practice, existing theoretical analyses suggest that the convergence error depends on the ambient dimension -- impacting scalability. This paper aims to shed light on this apparent mismatch between theory and practice. Our main result is a tighter analysis that eliminates the dimension dependence in sketching without imposing unrealistic restrictive assumptions in the distributed learning setup. With the approximate restricted strong smoothness property of overparameterized deep models and using the second-order geometry of the loss, we present optimization results for the single-local step and

K

$K$ -local step distributed learning and subsequent bounds on communication complexity, with implications for analyzing and implementing distributed learning for overparameterized deep models.

Chat is not available.

Poster in Workshop: Bridging the Gap Between Practice and Theory in Deep Learning

Breaking the Dimension Dependence in Sketching for Distributed Learning

Berivan Isik · Qiaobo Li · Mayank Shrivastava · Arindam Banerjee · Sanmi Koyejo

Poster
in
Workshop: Bridging the Gap Between Practice and Theory in Deep Learning