ε-Leaf Enumeration: Non-Repeating Self-Consistency via Truncated Tree Search
Abstract
Self-consistency boosts inference-time performance by sampling multiple reasoning traces in parallel and voting. However, in constrained domains like math and code, this strategy is compute-inefficient because it samples with replacement, repeatedly generating the same early prefixes. We show that this can be fully replaced by a deterministic enumeration of completions. While enumerating exponentially many completions would become infeasible for standard samplers, we show that, when the search space is truncated to tokens above a probability threshold, partial enumeration of the most likely or the most diverse branches is significantly more efficient. We propose ε-Leaf Enumeration (ELE), a deterministic tree-search method that imposes a strict non-repetition condition, covering the search space more optimally than via sampling. ELE naturally enables prefix reuse by caching shared prefixes across branches to redundant computation. Empirically, at equal token budgets, ELE explores more distinct completions than self-consistency which translates into consistent improvements in math, coding and general reasoning tasks through efficient parallel reasoning.