Invited Talk
in
Workshop: I Can't Believe It's Not Better: Where Large Language Models need to improve Mon, Apr 27, 2026 • 11:30 AM – 12:05 PM PDT

Invited Talk by Sewon Min: Are Mixture-of-Experts Modular? Why It Matters and How to Fix It

Sewon Min

Abstract

Mixture-of-Experts (MoEs) are designed as modular architectures—but are they functionally modular, i.e., enabling the independent use of expert subsets for downstream domains? We argue they are not, and that this gap matters: as MoEs grow larger, sparser, and more fine-grained, they become increasingly difficult to use, adapt, and fine-tune without heavy infrastructure. We introduce ModMoE, a self-supervised approach that makes modularity a first-class property—without human priors or loss in overall performance. ModMoE induces semantically specialized experts (rather than lexical partitioning) and enables effective selective expert usage across pool sizes, improving efficiency and performance in both zero-shot inference and fine-tuning. These results point toward more accessible and flexible MoEs, and a path to large-scale, sparse, and truly modular expert architectures.

Speaker

Sewon Min

Video

Chat is not available.