Enhancing Trust in Large Language Models via Uncertainty-Calibrated Fine-tuning
Abstract
Large language models (LLMs) have achieved remarkable success in natural language generation, yet they remain prone to hallucinations and often exhibit miscalibrated overconfidence, even when producing incorrect outputs. Reliable uncertainty estimation is a key requirement for deploying LLMs safely, as it enables users and downstream systems to assess confidence, detect hallucinations, and identify out-of-domain prompts. In this work, we propose an uncertainty-calibrated fine-tuning approach that improves the reliability of LLMs in open-ended, free-form generation settings. Our method introduces a novel uncertainty-aware causal language modeling loss, grounded in decision-theoretic principles, that explicitly encourages well-calibrated uncertainty estimates. We conduct extensive empirical evaluations across multiple free-form question-answering datasets and model architectures, demonstrating that our approach consistently yields better uncertainty calibration compared to standard fine-tuning. Furthermore, the experimental results show that the proposed method substantially enhances the model’s ability to detect hallucinations and identify out-of-domain prompts.