Poster
Dynamic Diffusion Transformer
Wangbo Zhao · Yizeng Han · Jiasheng Tang · Kai Wang · Yibing Song · Gao Huang · Fan Wang · Yang You
Hall 3 + Hall 2B #180
Diffusion Transformer (DiT), an emerging diffusion model for image generation,has demonstrated superior performance but suffers from substantial computationalcosts. Our investigations reveal that these costs stem from the static inferenceparadigm, which inevitably introduces redundant computation in certain diffusiontimesteps and spatial regions. To address this inefficiency, we propose DynamicDiffusion Transformer (DyDiT), an architecture that dynamically adjusts its compu-tation along both timestep and spatial dimensions during generation. Specifically,we introduce a Timestep-wise Dynamic Width (TDW) approach that adapts modelwidth conditioned on the generation timesteps. In addition, we design a Spatial-wise Dynamic Token (SDT) strategy to avoid redundant computation at unnecessaryspatial locations. Extensive experiments on various datasets and different-sizedmodels verify the superiority of DyDiT. Notably, with <3% additional fine-tuning it-erations, our method reduces the FLOPs of DiT-XL by 51%, accelerates generationby 1.73×, and achieves a competitive FID score of 2.07 on ImageNet.
Live content is unavailable. Log in and register to view live content