DafnyLLM: Pre-training Dafny Representations with Large Language Models for Code Verification
Abstract
Formal verification requires a deep integration of program semantics and logical specifications, posing challenges beyond those addressed by general-purpose code language models. In particular, the Dafny language introduces a tightly coupled syntax of implementation and specification, demanding models that can capture both procedural structure and verification intent. In this work, we propose DafnyLLM, the first pre-trained large language model tailored for the Dafny verification language. DafnyLLM leverages hybrid structural priors, including abstract syntax trees, control/data flow graphs, and assertion flow graphs, to encode rich program and specification semantics. To support Dafny’s long-range dependencies across intertwined definitions, specifications, and proofs, we design a sparse attention mechanism with memory and bridge tokens, enabling efficient and scalable modeling of verification contexts. We pre-train DafnyLLM on a curated corpus of real-world Dafny programs and evaluate it on verification-aware downstream tasks, such as specification-conditioned code retrieval and assertion classification. Experimental results show that DafnyLLM outperforms state-of-the-art code models by a significant margin, demonstrating the importance of incorporating verification-specific structure in code representation learning.