Poster
in
Workshop: Modular, Collaborative and Decentralized Deep Learning
Exploring Asynchronism in SWARM Parallelism
Yan Zuo · Gil Avraham · Thalaiyasingam Ajanthan · Sameera Ramasinghe · Alexander Long
Abstract:
SWARM parallelism is a framework that enhances pipeline parallelism in distributed training by incorporating fault tolerance. However, the synchronous nature of this approach introduces inefficiencies that can hinder performance and scalability. We analyze these inefficiencies and propose an asynchronous modification to the framework that enables nodes to perform local updates and periodically average their states. Our results demonstrate that this modified asynchronous SWARM achieves higher throughput without sacrificing model convergence.
Chat is not available.
Successful Page Load