Contributed Talk - Less Data, Faster Training: sampling bias from small dataset can speed up training
Jingwen Liu
Abstract
This work investigates the "small-vs-large gap", where training on fewer samples can lead to compute saving compared to using a larger dataset. This is observed across algorithmic tasks, architectures and optimizers and cannot be explained using prior theory. We argue that the speedup comes from appropriate layer-wise norm growth enabled by sampling biases, which is more pronounced when the dataset size is smaller. We provide both theoretical analysis and empirical evidence from various interventions. Together, our results highlight the underexplored potential of jointly considering different resources.
Video
Chat is not available.
Successful Page Load