Spotlight Poster
DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models
Chong Mou · Xintao Wang · Jiechong Song · Ying Shan · Jian Zhang
Halle B #15
Despite the ability of text-to-image (T2I) diffusion models to generate high-quality images, transferring this ability to accurate image editing remains a challenge. In this paper, we propose a novel image editing method, DragonDiffusion, enabling Drag-style manipulation on Diffusion models. Specifically, we treat image editing as the change of feature correspondence in a pre-trained diffusion model. By leveraging feature correspondence, we develop energy functions that align with the editing target, transforming image editing operations into gradient guidance. Based on this guidance approach, we also construct multi-scale guidance that considers both semantic and geometric alignment. Furthermore, we incorporate a visual cross-attention strategy based on a memory bank design to ensure consistency between the edited result and original image. Benefiting from these efficient designs, all content editing and consistency operations come from the feature correspondence without extra model fine-tuning. Extensive experiments demonstrate that our method has promising performance on various image editing tasks, including within a single image (e.g., object moving, resizing, and content dragging) or across images (e.g., appearance replacing and object pasting). Code is available at https://github.com/MC-E/DragonDiffusion.