Skip to yearly menu bar Skip to main content


oral
in
Affinity Workshop: Tiny Papers Oral Session 4

Training Mixture-of-Experts: A Focus on Expert-Token Matching

Masoumeh Zareapoor


Abstract:

Recent advancements in sparse Mixture-of-Experts (MoE) models, particularly in the Vision MoE (VMoE) framework, have demonstrated promising results in enhancing vision task performance. However, a key challenge persists in optimally routing tokens (such as image patches) to the right experts, without incurring excessive computational costs. Addressing this, we apply the regularized optimal transport, which relies on the Sinkhorn algorithm to the Vision MoE (VMoE) framework, aiming at improving the token-expert matching process. The resulting model, Sinkhorn-VMoE (SVMoE), represents a meaningful step in optimizing efficiency and effectiveness of sparsely-gated MoE models.

Live content is unavailable. Log in and register to view live content