Divide-and-Denoise: A Game-Theoretic Method for Fairly Composing Diffusion Models
Abstract
Despite the abundance of pre-trained diffusion models, it is still not obvious how to use them collectively. We propose a coordination approach based on a fair yet efficient division of labor. Divide-and-Denoise uses multiple pre-trained diffusion models to refine a noisy sample over time by alternating between (i) dividing the sample into regions satisfying game-theoretic criteria and (ii) denoising each region with its assigned model. This creates a composite denoising process that evolves with a division process. In the conditional image generation setting, we evaluate Divide-and-Denoise on the coordination of single-concept diffusion models, comparing it with prior compositional approaches and a multi-concept model. Across several metrics including the GenEval benchmark, our method generates images capturing each model's strengths, outperforming baselines and resolving common failures like missing objects and mismatched attributes.