Poster
in
Workshop: Building Trust in LLMs and LLM Applications: From Guardrails to Explainability to Regulation
MASAN: Enhancing Attack Stealth and Efficacy on Vision-Language Models via Smart Noise
Shuaiqi Wang · Sayali Deshpande · Rajesh Kudupudi · Alireza Mehrtash · Danial Dashti
The advent of vision-language models (VLMs) has unified multiple vision tasks within a single framework, enhancing capabilities but also exposing vulnerabilities to adversarial image perturbations. Traditional adversarial methods, although effective in launching strong attacks, typically reduce the human recognizability of images. In this work, we introduce the Min-max Attack with SmArt Noise (MASAN), a new attack that effectively deceives VLMs across various tasks, including vision question answering and image captioning. MASAN uses a gradient-based method to target and enhance the most crucial pixels for the attack, improving both the subtlety and effectiveness of attacks. Our extensive experiments demonstrate that MASAN achieves an average of 5% higher attack success rate with a 30% lower perturbation magnitude compared to state-of-the-art methods across leading VLMs such as BLIP2, LLaVA, Open-Flamingo, and MiniGPT-4.