FlashDriveVLA: Towards Real-time Inference for Autonomous Driving Vision-Language-Action Model
Yihao Liang ⋅ Zekai Li ⋅ Hongfei Zhang ⋅ Jian Chen ⋅ Zhijian Liu
Abstract
While the recent Alpamayo1 model sets a new baseline for Vision-Language-Action (VLA) models in autonomous driving, its significant inference latency precludes deployment on edge devices. In this work, we systematically analyze performance bottlenecks across each inference stage (encode, prefill, decode, and action) of Alpamayo1-10B, revealing that the model suffers from severe spatial redundancy. To bridge this gap, we propose FlashDriveVLA, an algorithm-system co-design framework that comprehensively addresses the efficiency bottlenecks at each stage. FlashDriveVLA reduces end-to-end latency from 769.2 ms to 158.2 ms (4.9x speedup), successfully bringing the autonomous driving VLA closer to real-time inference on edge hardware.
Chat is not available.
Successful Page Load