Skip to yearly menu bar Skip to main content


On the Surprising Efficacy of Distillation as an Alternative to Pre-Training Small Models

Sean Farhat · Deming Chen

Abstract

Chat is not available.