Context-Informed Sequence Classification: A Multimodal Approach to Vehicle Diagnostics
Abstract
Effective vehicle diagnostics are critical for safety and predictive maintenance but often rely solely on asynchronous discrete sequences of Diagnostic Trouble Codes (DTCs), overlooking valuable environmental context. This paper introduces BiCarFormer, a multimodal bidirectional Transformer that fuses DTC sequences with tokenized sensory data (temperature, pressure, humidity) via a co-attention mechanism and special embeddings. By integrating these heterogeneous modalities, BiCarFormer addresses the complexity and noise inherent in real-world automotive data. Evaluations on a large-scale fleet dataset of 22,137 error codes and 360 error patterns demonstrate that our approach significantly outperforms single-modality baselines. We also show that in this setting that Transformer can learn fluctuation of quantized continuous value through attention.