Invited Talk
in
Workshop: Second Workshop on Representational Alignment (Re$^2$-Align)
Good token representation improves the creativity of generative models
Dianbo Liu
in
Workshop: Second Workshop on Representational Alignment (Re$^2$-Align)
Discrete tokenization has become the default bridge between continuous data and modern generative models—from vision‑language transformers to diffusion networks. Yet the creative capacity of these models—their ability to produce diverse, high‑quality outputs—critically depends on how well a token’s representation aligns with the underlying data manifold. We demonstrate that poor alignment constrains the model to a narrow subset of modes, whereas well‑aligned codes preserve latent structure and invite exploration beyond the training distribution. Drawing on empirical studies in vision and language generation, this talk dissects the mechanisms by which token quality influences creativity, reviews diagnostics for measuring alignment, and outlines three practical strategies: representation‑aware codebooks, adaptive token refinement during training, and hybrid continuous–discrete decoders. By treating tokenization not as a preprocessing convenience but as a first‑class design choice, we show how to unlock richer, more imaginative generative behavior across domains.