BlockGen: Flexible Blockwise Sequence Modeling with Hybrid Samplers
Abstract
Block Diffusion Models (BDMs) accelerate discrete diffusion by generating token blocks in parallel while supporting KV caching. However, existing BDMs are typically trained with a single, \emph{fixed} block size, limiting the trade-offs at inference time. Moreover, most BDMs use masked diffusion, where tokens cannot be revised once generated, limiting quality in parallel decoding scenarios. We introduce \emph{BlockGen}, a general framework for blockwise sequence modeling that trains a single model over a \emph{set} of block sizes and is compatible with arbitrary block conditionals. In this work, we instantiate BlockGen with \emph{uniform-state} discrete diffusion within each block. BlockGen achieves improved likelihood compared to fixed block-size training and higher sample quality with fewer denoising steps. Training on multiple block sizes enables hybrid samplers that combine autoregressive and diffusion predictions, substantially improving over pure block-by-block generation while preserving KV caching.