Poster
in
Workshop: Bridging the Gap Between Practice and Theory in Deep Learning
Breaking the Dimension Dependence in Sketching for Distributed Learning
Berivan Isik · Qiaobo Li · Mayank Shrivastava · Arindam Banerjee · Sanmi Koyejo
Abstract:
The high communication cost between the server and the clients is a significant bottleneck in scaling distributed learning for modern overparameterized deep models. One popular approach to reduce this cost is linear sketching, where the sender projects the updates into a lower dimension before communication, and the receiver desketches before any subsequent computation. While sketched distributed learning is known to scale effectively in practice, existing theoretical analyses suggest that the convergence error depends on the ambient dimension -- impacting scalability. This paper aims to shed light on this apparent mismatch between theory and practice. Our main result is a tighter analysis that eliminates the dimension dependence in sketching without imposing unrealistic restrictive assumptions in the distributed learning setup. With the approximate restricted strong smoothness property of overparameterized deep models and using the second-order geometry of the loss, we present optimization results for the single-local step and $K$-local step distributed learning and subsequent bounds on communication complexity, with implications for analyzing and implementing distributed learning for overparameterized deep models.
Chat is not available.