Poster
DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation
Jing He · Haodong Li · huyongzhe · Guibao Shen · Yingjie CAI · Weichao Qiu · YINGCONG CHEN
Hall 3 + Hall 2B #115
[
Abstract
]
Thu 24 Apr midnight PDT
— 2:30 a.m. PDT
Abstract:
In the realm of image generation, creating customized images from visual prompt with additional textual instruction emerges as a promising endeavor. However, existing methods, both tuning-based and tuning-free, struggle with interpreting the subject-essential attributes from the visual prompt. This leads to subject-irrelevant attributes infiltrating the generation process, ultimately compromising the personalization quality in both editability and ID preservation. In this paper, we present DisEnvisionerDisEnvisioner, a novel approach for effectively extracting and enriching the subject-essential features while filtering out -irrelevant information, enabling exceptional customization performance, in a tuning-freetuning-free manner and using only a single imagea single image. Specifically, the feature of the subject and other irrelevant components are effectively separated into distinctive visual tokens, enabling a much more accurate customization. Aiming to further improving the ID consistency, we enrich the disentangled features, sculpting them into a more granular representation. Experiments demonstrate the superiority of our approach over existing methods in instruction response (editability), ID consistency, inference speed, and the overall image quality, highlighting the effectiveness and efficiency of DisEnvisioner.
Live content is unavailable. Log in and register to view live content