Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Deep Generative Model in Machine Learning: Theory, Principle and Efficacy

EDM2+: Exploring Efficient Diffusion Model Architectures for Visual Generation

Toyota Li

Keywords: [ neural network architecture ] [ diffusion models ]


Abstract: The training and sampling of diffusion models have been exhaustively elucidated in prior art. Instead, the underlying network architecture design remains on a shaky empirical footing. Furthermore, in accordance with the recent trend of scaling law, large-scale models make inroads into generative vision tasks. However, running such large diffusion models incurs a sizeable computational burden, rendering it desiderata to optimize calculations and efficiently allocate resources. To bridge these gaps, we navigate the design landscape of efficient U-Net based diffusion models, stemming from the prestigious EDM2. Our exploration route is organized along two key axes, layer placement and module interconnection. We systematically study fundamental design choices and uncover several intriguing insights for superior efficacy and efficiency. These findings culminate in our redesigned architecture, EDM2+, that reduces the computational complexity of the baseline EDM2 by $2\times$ without compromising the generation quality. Extensive experiments and comparative analyses highlight the effectiveness of our proposed network architecture, which achieves the state-of-the-art FID on the hallmark ImageNet benchmark. Code will be released upon acceptance.

Chat is not available.