Distilling SMT Solver Reasoning into Compact Language Models
Abstract
We conduct an extensive and comparative analysis of compact language models with fewer than 10B parameters in the domain of Satisfiability Modulo Theories (SMT). While recent works have shown that billion-parameter models can perform logical reasoning, their application to formal SMT solvers like Z3 \cite{de2008z3} remains underexplored, particularly regarding the trade-off between translation (parsing Natural Language to SMT) and execution (imitating solver reasoning traces). Additionally, while state-of-the-art commercial LLMs can perform strongly on SMT-type problems, the accuracies of smaller models on such tasks remain particularly low due to their limited capacity in arithmetic reasoning tasks. We explore whether it is possible to create a new lightweight language model that can perform well on SMT-based arithmetic and boolean reasoning tasks. Our contribution is a rigorous evaluation of whether resource-constrained models can serve as reliable interfaces for formal verification tools, either by approximating the internal reasoning steps of a solver and accurately predicting the satisfiability of the problem without calling external tools or by correctly formalizing natural language constraints and parsing them correctly in an SMT problem format that a solver like Z3 can solve. Additionally, we evaluate how supervised solver guided training transfers to a related real-world analytical reasoning task in order to assess the limitations of our approach.