Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Workshop on Learning from Time Series for Health

TOTEM: Tokenized Time Series Embeddings For General Time Series Analysis

Sabera Talukder · Yisong Yue · Georgia Gkioxari

Keywords: [ time series ] [ representation learning ] [ tokenization ]


Abstract:

Learning with time series health data poses many challenges such as variability in sensor semantics (e.g. neural voltage recordings vs US birth rate), difficulty in accessing data, and the relatively smaller data volume compared to other time series domains. Given these limitations, and the fact that the field of general time series analysis has recently begun to explore unified modeling, we approach unification from a complementary vantage point to ultimately benefit zero-shot performance to health time series. Historically general time series analysis unification entails when a common architectural backbone is retrained on a specific task for a specific dataset; we study the unification of time series data representations across domains in many tasks. To this end, we explore the impact of discrete, learnt, time series data representations that enable generalist, cross-domain training. Our method, TOTEM, or Tokenized Time Series Embeddings, proposes a simple tokenizer architecture that embeds time series data from varying domains using a discrete vectorized representation learned in a self-supervised manner. TOTEM works across multiple tasks and domains with minimal to no tuning. We study TOTEM’s efficacy with an extensive evaluation on 17 real world time series datasets across 3 tasks. Notably, the majority of our zero-shot datasets are time series health datasets from the neuroscience and birth domains. We evaluate both the specialist (i.e., train a model on each domain) and generalist (i.e., train a single model on many domains), and show that TOTEM matches or outperforms previous best methods on several popular benchmarks.

Chat is not available.