Poster
in
Workshop: 7th Robot Learning Workshop: Towards Robots with Human-Level Abilities
FLOWER: Democratizing Generalist Robot Policies with Efficient Vision-Language-Action Flow Policies
Moritz Reuss · Hongyi Zhou · Marcel Rühle · Ömer Yağmurlu · Fabian Otto · Rudolf Lioutikov
This work introduces FLOWER, an efficient, open-source Vision-Language-Action Flow policy.Vision-Language-Action (VLA) models have demonstrated remarkable potential for language-guided robotic manipulation by leveraging large-scale vision-language pretraining.However, existing approaches often rely on multi-billion-parameter architectures and massive datasets, making them prohibitively expensive to train.FLOWER is a novel generalist policy that not only outperforms current VLAs but also substantially lowers the computational burden for pretraining, fine-tuning, and inference.FLOWER combines a Rectified Flow Policy with a compact Vision-Language Model (VLM) backbone. The Flow Policy enables expressive, multimodal action generation. The compact VLM backbone provides robust semantic grounding while requiring only a fraction of the usual compute cost.Experiments across 4 simulated benchmarks and real-world settings on more than 100 tasks reveal that FLOWER consistently surpasses foundation policies, e.g., OpenVLA.FLOWER achieves superior performance while significantly reducing both training time and memory requirements.Both the performance and the training efficiency are maintained across different action spaces, showcasing the potential of FLOWER to handle diverse control tasks with affordable deployment, fine-tuning and customization.To encourage further research and the democratization of pretrained VLAs, we open-source the full pretraining and fine-tuning code along with the trained weights.