Skip to yearly menu bar Skip to main content


Oral #1: Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models

Vimal Thilak

Abstract

Video

Chat is not available.