Poster Sat, Apr 25, 2026 • 6:30 AM – 9:00 AM PDT

HARDTESTGEN: A High-Quality RL Verifier Generation Pipeline for LLM Algorithimic Coding

Zhongmou He · Yee Man Choi · Kexun Zhang · Ivan Bercovich · Jiabao Ji · Junting Zhou · Dejia Xu · Aidan Zhang · Yixiao Zeng · Lei Li

Abstract

Verifiers provide important reward signals for reinforcement learning of large language models (LLMs). However, it is challenging to develop or create reliable verifiers, especially for code generation tasks. A well-disguised wrong solution program may only be detected by carefully human-written edge cases that are difficult to synthesize automatically. To address this issue, we propose HardTestsGen, an approach to synthesize high-quality test cases for algorithmic coding problems. We curate a comprehensive algorithmic programming dataset HardTests with 26.6k problems and high-quality synthetic tests. Compared with existing tests, HardTestsGen tests demonstrate significantly higher accuracy in verifying LLM-generated code (+11.22 percentage points in precision, the percentage of actually correct code within the predicted correct ones). We also show that downstream post-training — including rejection sampling and reinforcement learning (RL) — using HardTests verifier results in improved performance of LLM code generation.