Sample Quality-Likelihood trade-off in Diffusion Models
Abstract
Recent advances in diffusion models achieved highest perceptual quality. However, previous studies have shown that there is an inverse correlation between the perceptual quality and data-likelihood, as the data-likelihood is more influenced by low-level noise and the perceptual quality is more reliant on the high-level noise in diffusion models. Consequently, there exists an trade-off between the sample-quality and data-likelihood, and usually models train to enhance one of those. In this paper, we present a simple-yet-novel method to alleviate this trade-off by fusing two different pre-trained diffusion models as a Mixture-of-Experts approach. For high noise levels we use an expert model on perceptual quality, and for the low noise level we leverage an expert model on data-likelihood. In experiments, our fused model achieves the best of both base models: comparable or better likelihood value compared to its expert likelihood component, while almost reaching the perceptual quality of its expert on image quality.