Skip to yearly menu bar Skip to main content


Is More Data Worth the Cost? Dataset Scaling Laws in a Tiny Attention-Only Decoder

Wiegand ⋅ Lorena Raichle ⋅ Rico Staedeli ⋅ Tomas Hrycej ⋅ Bernhard Bermeitinger ⋅ Siegfried Handschuh

Abstract

Chat is not available.