Transferable and Stealthy Adversarial Attacks on Large Vision-Language Models
Abstract
Existing adversarial attacks on large Vision-Language Models (VLMs) often struggle with limited transferability to black-box models or produce perceptible artifacts that are easily detected. This paper presents Progressive Semantic Infusion (PSI), a diffusion-based attack that progressively aligns and infuses natural target semantics. To improve transferability, PSI leverages diffusion priors to better align adversarial examples with the natural image distribution and employs progressive alignment to mitigate overfitting on a single fixed surrogate objective. To enhance stealthiness, PSI embeds source-aware cues during denoising to preserve visual fidelity and avoid detectable artifacts. Experiments show that PSI effectively attacks open-source, adversarially trained, and commercial VLMs, including GPT-5 and Grok-4, surpassing existing methods in both transferability and stealthiness. Our findings highlight a critical vulnerability in modern vision-language systems and offer valuable insights towards building more robust and trustworthy multimodal models.