Poster
in
Workshop: 1st ICLR Workshop on Time Series in the Age of Large Models

Convolutional Tokenization Improves Transformers for Multi-Channel Time Series Classification

Lev Kung ⋅ Brandon Yee ⋅ Maximilian Rutkowski

Project Page [ OpenReview]

Abstract

Transformers have shown promise for time series modeling, yet often underperform simpler CNN baselines on multi-channel physiological signals. We hypothesize this stems from the mismatch between patch-based tokenization and the local structure inherent in such signals. We propose a hybrid architecture that replaces standard patch embedding with convolutional tokenization: a spatial attention module learns channel importance, followed by multi-scale temporal convolutions that extract local features before feeding to a transformer encoder. On 64-channel EEG classification with 109 subjects, our hybrid model achieves 81.1\% F1-score, outperforming pure transformers (76.1\%), CNN baselines (78.4\%), and LSTMs (72.7\%). Our results suggest that incorporating convolutional inductive biases into the tokenization stage is crucial for transformers to excel on multi-channel time series.

Chat is not available.