On Neural Scaling Laws for Weather Emulation through Continual Learning
Abstract
Neural scaling laws, which predict the performance of large neural networks as a function of model, data, and compute scale, are the cornerstone of building foundation models across applications such as natural language processing and computer vision. Here, we study the neural scaling performance of Transformers in science, and in particular in emulating atmospheric physics for weather forecasting. We focus on continual learning with constant learning rates and periodic cooldowns as a practical training strategy for studying scaling at a manageable compute cost; and we show that models trained in this way follow predictable scaling trends and consistently outperform models trained with standard cosine learning rate schedules. We also demonstrate that cooldown phases can be re-purposed to improve downstream performance, for example, for accurate multi-step roll-outs over long time horizons and sharper forecasts through spectral loss adjustments. Finally, we conduct scaling experiments across a wide range of model and dataset sizes under various compute constraints to identify compute-optimal training regimes and characterize the resulting scaling behavior. Our code is open-sourced.