B-DENSE: Branching For Dense Ensemble Network Supervision Effeciency
Abstract
Inspired by non-equilibrium thermodynamics, diffusion models have achieved state-of-the-art performance in generative modeling. However, their iterative sampling nature results in high inference latency, which is typically required to maintain image quality. While recent efforts in distillation techniques have improved sample quality with fewer steps, they discard intermediate trajectory steps. By discarding intermediate trajectory steps, these methods lose structural information, resulting in significant discretization errors. To mitigate this issue, we propose a novel framework, B-DENSE, that leverages multi-branch trajectory alignment. We train the student model using branches that simultaneously map to the entire sequence of the teacher's target timesteps. We modify the student architecture to output K-fold expanded channels. Each channel subset corresponds to a specific branch representing a discrete intermediate step in the teacher’s trajectory. By enforcing intermediate trajectory alignment, the student model learns to navigate the solution space from the earliest stages of training, leading to better image generation quality than the baseline distillation frameworks.