TS-Arena: A Live Forecast Pre-Registration Platform for Leakage-Free Evaluation of Time Series Foundation Models
Abstract
Time Series Foundation Models (TSFMs) are transforming forecasting, yet evaluating them on historical data is increasingly compromised by train-test sample overlaps and temporal overlaps between correlated training and test series. This paper presents TS-Arena, an infrastructure designed to benchmark TSFMs by transitioning from retrospective historical testing to an environment of continuous, prospective evaluation. Our core contribution is a strict \emph{Forecast Pre-Registration Protocol} (FPRP): models must submit predictions before the ground-truth data physically exists, making test-set contamination impossible by design. The platform relies on a modular microservice architecture that ingests real-time data streams and orchestrates containerized model submissions under enforced registration windows. By combining pre-registration with continuous evaluation rounds, TS-Arena aims to prevent both direct and indirect information leakage while enabling fast, ongoing model comparison. First results from simulating TS-Arena over one year of energy time series (e.g. renewable energy generation, electricity prices, district heating cogeneration) demonstrate the viability and discriminative power of leakage-free live evaluation. A prototype is available at http://ucs8sws04cko8o88sso4800s.45.9.61.32.sslip.io.