Any-Order Any-Subset AutoRegressive Model
Abstract
We propose Any-order Any-subset Autoregressive modeling (A3), a novel sequence generation framework that generalizes standard autoregressive (AR) factorization to support the prediction of arbitrary token groups in any order. A3 overcomes the limitations of conventional left-to-right decoding by enabling flexible groupwise generation while preserving probabilistic rigor and training stability. Our design combines a two-stream attention architecture with a progressive training strategy, allowing both efficient parallel decoding and robust modeling of diverse dependency structures. Empirical results demonstrate that A3 achieves a superior trade-off between generation speed, flexibility, and quality compared to state-of-the-art AR and diffusion-based methods. This work offers a unified approach for a flexible, efficient, and novel language modeling paradigm.