Skip to yearly menu bar Skip to main content


Poster

Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs

Minh Nguyen · Andrew Baker · Clement Neo · Allen Roush · Andreas Kirsch · Ravid Shwartz-Ziv

Hall 3 + Hall 2B #237
[ ] [ Project Page ]
Thu 24 Apr midnight PDT — 2:30 a.m. PDT
 
Oral presentation: Oral Session 1B
Wed 23 Apr 7:30 p.m. PDT — 9 p.m. PDT

Abstract:

Large Language Models (LLMs) generate text by sampling the next token from a probability distribution over the vocabulary at each decoding step. However, popular sampling methods like top-p (nucleus sampling) often struggle to balance quality and diversity, especially at higher temperatures, leading to incoherent or repetitive outputs. To address this challenge, we propose min-p sampling, a dynamic truncation method that adjusts the sampling threshold based on the model's confidence by scaling according to the top token's probability. We conduct extensive experiments on benchmarks including GPQA, GSM8K, and AlpacaEval Creative Writing, demonstrating that min-p sampling improves both the quality and diversity of generated text, particularly at high temperatures. Moreover, human evaluations reveal a clear preference for min-p sampling in terms of both text quality and diversity. Min-p sampling has been adopted by leading open-source LLM implementations including Hugging Face, VLLM and many others, highlighting its practical utility and potential impact.

Live content is unavailable. Log in and register to view live content