Invited Talk by Aditi Raghunathan: The Creative Limits of Next-token Prediction
Abstract
Current LLM pipelines rely on a convenient illusion: that scaling next-token prediction and tweaking temperature naturally unlocks diverse, open-ended generation. In reality, standard autoregression is fundamentally myopic. We quantify this using minimal algorithmic tasks that require far-sighted stochastic planning. In these environments, next-token learning fails to plan, whereas multi-token approaches excel. Furthermore, standard output-layer temperature sampling degrades coherence in its attempt to elicit randomness. Surprisingly, simply injecting noise directly at the input layer (seed-conditioning) works as well, if not better. This same diversity collapse plagues test-time compute scaling for math reasoning, where standard decoding merely regurgitates redundant errors. Yet, applying a simple mode-conditioning (ModC) prefix forces the model to explore distinct reasoning paths, instantly yielding a 4x efficiency gain.