Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging
Pierre Ablin ⋅ Angelos Katharopoulos ⋅ Skyler Seto ⋅ David Grangier
2025 Oral
in
Workshop: Modular, Collaborative and Decentralized Deep Learning
in
Workshop: Modular, Collaborative and Decentralized Deep Learning
Video
Chat is not available.
Successful Page Load