Learning Transferable Sensor Models via Language-Informed Pretraining
Yuliang Chen ⋅ Arvind Pillai ⋅ Yu Wu ⋅ Tess Griffin ⋅ Lisa Marsch ⋅ Michael Heinz ⋅ Nicholas Jacobson ⋅ Andrew Campbell
Abstract
Sensing systems produce large scale unlabeled multivariate time series, therefore self supervised pretraining is a practical way to learn transferable representations. Yet many foundation models are trained for forecasting and can miss the semantic structure needed for classification and reasoning. Sensor language alignment improves semantic transfer, but existing methods often assume fixed sensor inputs, such as predefined channels, lengths, or temporal resolutions, which limits cross domain use. We introduce $\textbf{SLIP}$ ($\textbf{S}$ensor $\textbf{L}$anguage $\textbf{I}$nformed $\textbf{P}$retraining), an open source framework that learns language aligned representations that generalize across diverse sensor configurations. SLIP combines contrastive alignment with sensor conditioned captioning, supporting both discriminative understanding and generative reasoning. By repurposing a pretrained decoder-only language model using cross attention and adding a flexible patch embedder, SLIP handles different temporal resolutions and variable length inference without additional retraining. Our experiments show that SLIP improves linear probing and zero-shot classification, as well as signal captioning and question answering.
Chat is not available.
Successful Page Load