The Diffusion Duality
Subham Sahoo · Justin Deschenaux · Aaron Gokaslan · Guanghan Wang · Justin Chiu · Volodymyr Kuleshov
Abstract
Discrete diffusions models have been demonstrated to be surprisingly strong language models.In this work, we show that discrete diffusion language models can be further improved by adapting methods from continuous-state diffusion models.We establish a core property of uniform state diffusion: it stems from an underlying Gaussian diffusion process.This property allows us to improve both training by utilizing a curriculum learning strategy that reduces training variance and leads to $\mathbf{2\times}$ faster convergence, as well as sampling by adapting efficient distillation methods from continuous-state diffusion models.As a result, models surpass an autoregressive model’s zero-shot perplexity on 3 out of 7 benchmarks and we manage to reduce the sampling steps by **two orders of magnitude** while preserving sample quality.
Chat is not available.
Successful Page Load