Poster Fri, Apr 24, 2026 • 6:30 AM – 9:00 AM PDT Pavilion 3 P3-#615

On learning linear dynamical systems in context with attention layers

Maria-Luiza Vlǎdǎrean ⋅ Xuhui Zhang ⋅ Suvrit Sra

[ Poster] [ OpenReview]

Abstract

This paper studies the expressive power of linear attention layers for in-context learning (ICL) of linear dynamical systems (LDS). We consider training on sequences of inexact observations produced by noise-corrupted LDSs, with all perturbations being Gaussian. Importantly, this non-i.i.d. data setting is a significant step towards modeling real-world scenarios. We provide the optimal weight construction for a single linear-attention layer and show its equivalence to one step of Gradient Descent relative to an autoregression objective of window size one. Guided by experiments, we uncover a connection to a generalization of the Preconditioned Conjugate Gradient method for larger window sizes. We back our findings with numerical evidence. These results add to the existing understanding of transformers’ expressivity as in-context learners and offer plausible hypotheses for recent observations that place their performance on par with that of the Kalman Filter — the optimal model-dependent learner for this setting.

Video

Chat is not available.