ICLR Training Mixture-of-Experts: A Focus on Expert-Token Matching

oral
in
Affinity Workshop: Tiny Papers Oral Session 4

Training Mixture-of-Experts: A Focus on Expert-Token Matching

Masoumeh Zareapoor

[ Abstract ] [ Project Page ]

Abstract:

Recent advancements in sparse Mixture-of-Experts (MoE) models, particularly in the Vision MoE (VMoE) framework, have demonstrated promising results in enhancing vision task performance. However, a key challenge persists in optimally routing tokens (such as image patches) to the right experts, without incurring excessive computational costs. Addressing this, we apply the regularized optimal transport, which relies on the Sinkhorn algorithm to the Vision MoE (VMoE) framework, aiming at improving the token-expert matching process. The resulting model, Sinkhorn-VMoE (SVMoE), represents a meaningful step in optimizing efficiency and effectiveness of sparsely-gated MoE models.

Chat is not available.

oral in Affinity Workshop: Tiny Papers Oral Session 4

Training Mixture-of-Experts: A Focus on Expert-Token Matching

Masoumeh Zareapoor

oral
in
Affinity Workshop: Tiny Papers Oral Session 4