Efficient Hallucination Detection in Automatic Code Generation
Abstract
Large language models (LLMs) frequently produce source code that seems correct and well-formed, yet includes hallucinated elements that cause downstream test failures. While uncertainty quantification (UQ) methods have shown promise for hallucination detection in natural language, their effectiveness in the code generation setting remains largely unexplored. In this work, we investigate the performance of state-of-the-art UQ methods for hallucination detection in source code generation and propose an efficient and effective training-based approach. We develop a diff-based pipeline to construct a code dataset annotated with line-level LLM hallucinations, enabling systematic benchmarking of hallucination detection methods. Using this pipeline, we build a large-scale annotated dataset and train a lightweight Transformer-based hallucination detector that leverages LLM inner representations as input features. Experimental results across diverse code generation domains demonstrate that the detector substantially outperforms other existing approaches in line-level hallucination detection. We also highlight the potential usage of the detector in LLM-agents for coding to make them capable of self-correction and reduce the cases of generating erroneous code. We release the first publicly available dataset of line-level code hallucinations, along with the corresponding source code and trained hallucination detectors https://github.com/datapaf/CodeHallucinationDetection