ICLR Poster Concept Pinpoint Eraser for Text-to-image Diffusion Models via Residual Attention Gate

Poster

Concept Pinpoint Eraser for Text-to-image Diffusion Models via Residual Attention Gate

Byung Hyun Lee · Sungjin Lim · Seunggyu Lee · Dong Un Kang · Se Young Chun

Hall 3 + Hall 2B #171

[ Abstract ] [ Project Page ]

Wed 23 Apr 7 p.m. PDT — 9:30 p.m. PDT

Abstract:

Remarkable progress in text-to-image diffusion models has brought a major concern about potentially generating images on inappropriate or trademarked concepts. Concept erasing has been investigated with the goals of deleting target concepts in diffusion models while preserving other concepts with minimal distortion. To achieve these goals, recent concept erasing methods usually fine-tune the cross-attention layers of diffusion models. In this work, we first show that merely updating the cross-attention layers in diffusion models, which is mathematically equivalent to adding linear modules to weights, may not be able to preserve diverse remaining concepts. Then, we propose a novel framework, dubbed Concept Pinpoint Eraser (CPE), by adding nonlinear Residual Attention Gates (ResAGs) that selectively erase (or cut) target concepts while safeguarding remaining concepts from broad distributions by employing an attention anchoring loss to prevent the forgetting. Moreover, we adversarially train CPE with ResAG and learnable text embeddings in an iterative manner to maximize erasing performance and enhance robustness against adversarial attacks. Extensive experiments on the erasure of celebrities, artistic styles, and explicit contents demonstrated that the proposed CPE outperforms prior arts by keeping diverse remaining concepts while deleting the target concepts with robustness against attack prompts.

Live content is unavailable. Log in and register to view live content