From U-Nets to DiTs: The Architectural Evolution of Text-to-Image Diffusion Models (2021–2025)
Zhenyuan Chen · Zechuan Zhang · Feng Zhang
Abstract
A comprehensive analysis of how diffusion model architectures evolved from U-Net backbones to Diffusion Transformers, transforming text-to-image generation capabilities. .
Successful Page Load