ICLR Poster Wayward Concepts In Large Multimodal Models

Poster

Wayward Concepts In Large Multimodal Models

Brandon Trabucco · Max Gurinas · Kyle Doherty · Russ Salakhutdinov

Hall 3 + Hall 2B #564

[ Abstract ]

Fri 25 Apr midnight PDT — 2:30 a.m. PDT

Abstract: Large multimodal models such as Stable Diffusion can generate, detect, and classify new visual concepts after optimizing just the prompt. How are prompt embeddings for visual concepts found by prompt tuning methods different from typical discrete prompts? We conduct a large-scale analysis on three state-of-the-art models in text-to-image generation, open-set object detection, and zero-shot classification, and find that prompts optimized to represent new visual concepts are akin to an adversarial attack on the text encoder. Across 4,800 new embeddings trained for 40 diverse visual concepts on four standard datasets, we find perturbations within an

$\epsilon$ -ball to any prompt that reprogram models to generate, detect, and classify arbitrary subjects. These perturbations target the final-layers in text encoders, and steer pooling tokens towards the subject. We explore the transferability of these prompts, and find that perturbations reprogramming multimodal models are initialization-specific, and model-specific. Code for reproducing our work is available at the following site: https://wayward-concepts.github.io.

Live content is unavailable. Log in and register to view live content