Skip to yearly menu bar Skip to main content


Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient

Jan Ludziejewski ⋅ Maciej Pióro ⋅ Jakub Krajewski ⋅ Michał Krutul ⋅ Jan Małaśnicki ⋅ Maciej Stefaniak ⋅ Piotr Sankowski ⋅ Marek Cygan ⋅ Kamil Adamczewski ⋅ Piotr Miłoś ⋅ Sebastian Jaszczur

Abstract

Chat is not available.