Spotlight Poster

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Dustin Podell · Zion English · Kyle Lacey · Andreas Blattmann · Tim Dockhorn · Jonas Müller · Joe Penna · Robin Rombach

Halle B #252
[ ]
Wed 8 May 7:30 a.m. PDT — 9:30 a.m. PDT


We present Stable Diffusion XL (SDXL), a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone, achieved by significantly increasing the number of attention blocks and including a second text encoder. Further, we design multiple novel conditioning schemes and train SDXL on multiple aspect ratios. To ensure highest quality results, we also introduce a refinement model which is used to improve the visual fidelity of samples generated by SDXL using a post-hoc image-to-image technique. We demonstrate that SDXL improves dramatically over previous versions of Stable Diffusion and achieves results competitive with those of black-box state-of-the-art image generators such as Midjourney.

