SynReason: Enhancing Synthesis Reasoning via Reinforcement Learning Experimental Feedback
Abstract
Materials discovery efforts have produced millions of candidate inorganic compounds, yet identifying practical synthesis routes remains a major bottleneck. While the reasoning capabilities of LLMs are promising, their reasoning traces can result in poor synthesis prediction accuracy due to a misalignment with experimental data. We introduce Reinforcement Learning Experimental Feedback (RLEF), a simple RL post-training approach that samples multiple candidate completions per prompt, scores them with an experimental alignment reward, followed by group-normalized advantages and regularized updates with a KL penalty to a reference model. We demonstrate our method on materials precursor recommendation task: given a target composition, generate a ranked list of precursor sets that match experimental syntheses. Using RLEF, we develop SYNREASON, a synthesis planning LLM model that is capable of chemical reasoning. Across different model sizes, RLEF redefines the accuracy–speed Pareto front. A 4B model post-trained with RLEF outperforms even base models that are 8× larger across all Top-K metrics. Interestingly, contrary to other RL post-training methods, RLEF shortens reasoning traces (instead of lengthening) while simultaneously improving performance. This reasoning compression suggests that RLEF induces more selective and task-relevant chemical reasoning in LLMs.