Poster
in
Workshop: 2nd Workshop on Mathematical and Empirical Understanding of Foundation Models
Few-Shot Adaptation of Vision-Language Foundation Models via Dual-Path Inference
Ce Zhang · Simon Stepputtis · Katia Sycara · Yaqi Xie
Leveraging vast datasets on the Internet, large-scale Vision-Language Models (VLMs) demonstrates great potential in learning open-world visual concepts, and exhibit remarkable performance across a wide range of downstream tasks through efficient fine-tuning. In this work, we propose a simple yet effective fine-tuning approach called DualAdapter, which for the first time investigates the inference capabilities of VLMs along both positive and negative directions. Unlike conventional approaches that solely rely on positive adapter-style fine-tuning, DualAdapter uniquely incorporate negative text descriptions and image samples, enabling fine-turning from a dual perspective. During the few-shot adaptation process, our DualAdapter explicitly enhances correct alignments while simultaneously minimizing incorrect associations. Our rigorous evaluation across 15 datasets reveals that DualAdapter significantly surpasses existing state-of-the-art methods in terms of both adaptation efficiency and robustness to distribution shifts.