Beyond Rationalization: Criteria and Guidelines for Algorithmic Reasoning Traces in LLM Logical Reasoning
Abstract
Chain-of-thought (CoT) prompting is now a standard way to elicit “reasoning” from large language models, but recent work shows that CoT can hurt accuracy on some pattern-based in-context learning tasks and often produces explanations that look reasonable but do not match how the model actually made its decision. At the same time, symbolic and neuro-symbolic approaches such as Faithful CoT, SymbCoT, and Logic-LM connect natural language traces to executable formalisms and obtain higher accuracy and verifiable faithfulness on logical benchmarks. In this \textit{position paper}, we argue that for logical reasoning it is important to separate linguistic rationalization, where CoT mainly describes an opaque pattern-matching process, from algorithmic reasoning, where the trace corresponds to a concrete computation that a solver can run. We propose four simple criteria for treating a CoT trace as algorithmic reasoning in logic-focused tasks and we suggest a three-condition evaluation protocol that compares direct answering, free-form CoT, and symbolic or neuro-symbolic CoT, together with lightweight checks of faithfulness and computational cost. The goal is to provide concrete guidance on when CoT should be treated only as a narrative aid and when logical reasoning research should instead require solver-backed, verifiable reasoning traces.