Poster
in
Workshop: Quantify Uncertainty and Hallucination in Foundation Models: The Next Frontier in Reliable AI

Rethinking Uncertainty Estimation in Natural Language Generation

Lukas Aichberger · Kajetan Schweighofer · Sepp Hochreiter

Keywords: LLM uncertainty estimation proper scoring rules NLL uncertainty measures NLG G-NLL

Project Page [ Poster] [ OpenReview]

Abstract

Large Language Models (LLMs) are increasingly employed in real-world applications, driving the need to evaluate the trustworthiness of their generated text. To this end, reliable uncertainty estimation is essential. Leading uncertainty estimation methods generate and analyze multiple output sequences, which is computationally expensive and impractical at scale. In this work, we inspect the theoretical foundations of these methods and explore new directions to enhance computational efficiency. Building on the framework of proper scoring rules, we find that the negative log-likelihood of the most likely output sequence constitutes a theoretically grounded uncertainty measure. To approximate this alternative measure, we propose G-NLL, obtained using a single output sequence from greedy decoding. This approach streamlines uncertainty estimation while preserving theoretical rigor. Empirical results demonstrate that G-NLL achieves state-of-the-art performance across various LLMs and tasks. Our work lays the foundation for efficient and reliable uncertainty estimation in natural language generation, challenging the necessity of the prevalent methods that are more complex and resource-intensive.

Chat is not available.