Benchmarking Augmentation Strategies for LLM-Based Solid-State Synthesis Prediction
Abstract
Identifying synthesis recipes for new inorganic materials remains a major bottleneck in materials discovery. We investigate whether large language models (LLMs) can improve solid-state synthesis prediction through three augmentation strategies: retrieval-augmented generation (RAG) from the literature, the use of domain-specific thermodynamic tools, and multi-step, test-time compute workflows such as debate, self-reflection, and sequential pipelines. When evaluating on 674 literature-derived targets, we find that retrieving relevant synthesis precedents is the most effective strategy, improving top-10 precursor accuracy from 77.0\% to 83.5\%. Thermodynamic tools also improve performance (80.6\%), but provide little additional benefit when RAG is already used (82.9\% on Gemini 3 Flash, 77.5\% on Gemini 2 Flash). By contrast, test-time compute does not improve performance, and sequential multi-agent workflows often reduce accuracy because errors introduced in earlier stages propagate downstream, causing later steps to mis-rank candidates or overwrite correct answers. Our results show that, for solid-state synthesis prediction, providing models with relevant domain information is more effective than increasing test-time compute through multi-agent deliberation.