Lost in the Non-convex Loss Landscape: How to Fine-tune the Large Time Series Model?
Abstract
Recently, large time series models (LTSM) have become popular and important because they exhibit characteristics similar to large language models, such as flexible context length, scalability, and task generality, outperforming the advanced task-specific models in the domain. However, existing research indicates that the pre-trained LTSM can show a poor non-convex loss landscape (indicating poor trainability). Hence, directly fine-tuning pre-trained LTSM shows overfitting, which leads to poor fine-tuning performance, even worse than training from scratch on the downstream datasets. This severely diminishes the value of the pre-trained LTSM. To address this, we propose a new fine-tuning method called Smoothed Full Fine-tuning (SFF). Specifically, before fine-tuning, we first construct an auxiliary LTSM with a smooth loss landscape (indicating good trainability) through random initialization. Second, we utilize it to smooth the loss landscape of the pre-trained LTSM through linear interpolation between their weights. As a result, the smoothed LTSM acquires good trainability while retaining good pre-training knowledge, thereby achieving better performance when fine-tuned on the downstream dataset. We also explain why SFF is effective from the perspective of optimization theory: interpolation perturbs sharp minima without obviously harming originally flat regions, thereby aiding sharp minima escape to better and smoother basins. Extensive experiments on popular datasets show that our method indeed improves the performance of eight popular LTSMs, e.g., Timer, TimesFM, MOMENT, UniTS, MOIRAI, Chronos, TTMs, and Sundial, in different downstream tasks.