Poster
in
Workshop: ReALM-GEN: Real-World Constrained and Preference-Aligned Flow- and Diffusion-based Generative Models Mon, Apr 27, 2026 • 12:00 PM – 12:50 PM PDT

Unifying Autoregressive and Discrete Diffusion Language Modeling via Cross-Regressive Decoding

Dmitry Abulkhanov ⋅ Daniil Strizhakov ⋅ Maxim Panov

Project Page [ OpenReview]

Abstract

Inference acceleration can unintentionally change model behavior, complicating alignment-sensitive deployments where post-training (like RLHF) should be preserved. We introduce $\textbf{Cross-Regression}$, a decoding-time method that accelerates generation while providing an explicit mechanism to preserve or relax distributional fidelity. Cross-Regression augments a pretrained autoregressive transformer with a dual-stream design: a frozen control stream computes exact next-token probabilities, and a predictive stream proposes multi-token drafts in parallel. An energy-based acceptance test, derived from the per-token log probability ratio between control and predictive streams, determines how many proposed tokens can be safely committed. The method provides an explicit control between $\textit{lossless sampling}$ and a faster $\textit{lossy regime}$ with controllable deviation. Across models from 1.5B to 70B parameters, we observe strong scaling of acceptance length and realize $3–6\times$ speedups with near-complete quality retention across reasoning, code, and dialogue benchmarks, and we demonstrate modality transfer by accelerating Whisper decoding.

Chat is not available.