A Noise is Worth Diffusion Guidance
Abstract
Diffusion models have demonstrated remarkable image generation capabilities, but their performance heavily relies on sampling guidance such as classifier-free guidance (CFG). While sampling guidance significantly enhances image quality, it requires two forward passes at every denoising step, leading to substantial computational overhead. Existing approaches mitigate this cost through distillation, training a student network to learn the guided predictions. In contrast, we take a distinct approach by refining the initial Gaussian noise, a critical yet under-explored factor in the diffusion-based generation pipelines. We introduce a noise refinement framework, NoiseRefine, where a refining network is trained to minimize the difference between images generated by unguided sampling from the refined noise and those produced by guided sampling from the input Gaussian noise. This simple approach demonstrates that images from the refined noise alleviate artifacts and mitigate structural collapse, achieving significantly higher quality than those generated from pure Gaussian noise without modifying the diffusion model, thereby preserving its prior knowledge and compatibility with finetuned or timestep distilled variants. Beyond its practical benefits, we provide an in-depth analysis of refined noise, offering insights into its role in the denoising process and its interaction with guidance. Our findings suggest that structured noise initialization is key to efficient and high-fidelity image synthesis. Project page: https://cvlab-kaist.github.io/NoiseRefine/