ProofRepairBench: Exploring Proof Repair in Lean
Abstract
Proof repair is an emerging task in AI systems for formal theorem proving: given an incorrect proof and an error message from the underlying proof assistant, the goal is to generate a correct proof. As AI proving systems shift towards whole-proof generation, the rigor of proof-checking kernels makes it such that one-shot-generated proofs are rarely error-free. However, they may be easily repairable, which makes the proof repair task a crucial component of AI proving systems and an important research target. To study the proof task in isolation, we present RepairBench---a benchmark of 127 challenging repair problems in Lean. We use this benchmark to evaluate a set of language models coupled with various meta-generation strategies for proof repair and draw a number of practical takeaways. We release the benchmark to make it a useful for to community to advance AI proving systems.