Poster
in
Workshop: The First Workshop on Efficient Spatial Reasoning Mon, Apr 27, 2026 • 10:25 AM – 11:25 AM PDT

FlashDriveVLA: Towards Real-time Inference for Autonomous Driving Vision-Language-Action Model

Yihao Liang ⋅ Zekai Li ⋅ Hongfei Zhang ⋅ Jian Chen ⋅ Zhijian Liu

Project Page [ OpenReview]

Abstract

While the recent Alpamayo1 model sets a new baseline for Vision-Language-Action (VLA) models in autonomous driving, its significant inference latency precludes deployment on edge devices. In this work, we systematically analyze performance bottlenecks across each inference stage (encode, prefill, decode, and action) of Alpamayo1-10B, revealing that the model suffers from severe spatial redundancy. To bridge this gap, we propose FlashDriveVLA, an algorithm-system co-design framework that comprehensively addresses the efficiency bottlenecks at each stage. FlashDriveVLA reduces end-to-end latency from 769.2 ms to 158.2 ms (4.9x speedup), successfully bringing the autonomous driving VLA closer to real-time inference on edge hardware.

Chat is not available.