Beyond Earthbound Benchmarks: Evaluating Time Series Foundation Models on Satellite Telemetry Data
Abstract
Time series foundation models (TSFMs) excel on terrestrial benchmarks but remain untested in the space domain. We present the first systematic zero-shot evaluation of four TSFMs - Credence, TimesFM-2.5, Chronos-2, and TiRex - on the ESA Anomaly Detection Benchmark. Results demonstrate generalization, with most models outperforming seasonal naive baselines. While probabilistic accuracy (CRPS) improves at lower sampling frequencies, point accuracy (MASE) degrades significantly. Overall, Chronos-2 achieves the strongest point accuracy (leading MASE), while Credence provides superior probabilistic performance (leading CRPS). These findings validate TSFMs for automated mission operations and establish a baseline for future space forecasting benchmarks.