Poster
Halton Scheduler for Masked Generative Image Transformer
Victor Besnier · Mickael Chen · David Hurych · Eduardo Valle · MATTHIEU CORD
Hall 3 + Hall 2B #171
Masked Generative Image Transformers (MaskGIT) have emerged as a scalableand efficient image generation framework, able to deliver high-quality visuals withlow inference costs. However, MaskGIT’s token unmasking scheduler, an essentialcomponent of the framework, has not received the attention it deserves. We analyzethe sampling objective in MaskGIT, based on the mutual information betweentokens, and elucidate its shortcomings. We then propose a new sampling strategybased on our Halton scheduler instead of the original Confidence scheduler. Moreprecisely, our method selects the token’s position according to a quasi-random,low-discrepancy Halton sequence. Intuitively, that method spreads the tokensspatially, progressively covering the image uniformly at each step. Our analysisshows that it allows reducing non-recoverable sampling errors, leading to simplerhyper-parameters tuning and better quality images. Our scheduler does not requireretraining or noise injection and may serve as a simple drop-in replacement forthe original sampling strategy. Evaluation of both class-to-image synthesis onImageNet and text-to-image generation on the COCO dataset demonstrates that theHalton scheduler outperforms the Confidence scheduler quantitatively by reducingthe FID and qualitatively by generating more diverse and more detailed images.Our code is at https://github.com/valeoai/Halton-MaskGIT.
Live content is unavailable. Log in and register to view live content