Oral
in
Workshop: Deep Generative Model in Machine Learning: Theory, Principle and Efficacy

The Diffusion Duality

Subham Sahoo · Justin Deschenaux · Aaron Gokaslan · Guanghan Wang · Justin Chiu · Volodymyr Kuleshov

Keywords: Language Models Discrete Diffusion Diffusion Models

Project Page [ OpenReview]

Abstract

Discrete diffusions models have been demonstrated to be surprisingly strong language models.In this work, we show that discrete diffusion language models can be further improved by adapting methods from continuous-state diffusion models.We establish a core property of uniform state diffusion: it stems from an underlying Gaussian diffusion process.This property allows us to improve both training by utilizing a curriculum learning strategy that reduces training variance and leads to $\mathbf{2\times}$ faster convergence, as well as sampling by adapting efficient distillation methods from continuous-state diffusion models.As a result, models surpass an autoregressive model’s zero-shot perplexity on 3 out of 7 benchmarks and we manage to reduce the sampling steps by **two orders of magnitude** while preserving sample quality.

Chat is not available.