Cognitive Foundations for Reasoning and Their Manifestation in LLMs
Abstract
Large language models (LLMs) solve complex problems yet fail on simpler variants, suggesting they achieve correct outputs through mechanisms fundamentally different from human reasoning. To understand this gap, we synthesize cognitive science research into a taxonomy of 28 cognitive elements spanning reasoning invariants, meta-cognitive controls, knowledge representations, and transformation operations. We conduct the first large-scale empirical analysis of 192K reasoning traces from 18 models across text, vision, and audio modalities, complemented by 54 human think-aloud traces. Our analysis reveals a fundamental misalignment: models narrow to rigid sequential processing on ill-structured problems precisely where diverse representations and meta-cognitive monitoring correlate most strongly with success. Human traces show more abstraction and conceptual processing, while models default to surface-level enumeration. Leveraging these behavioral patterns, we develop test-time reasoning guidance that scaffolds successful cognitive structures, improving performance by up to 26.7% on complex problems. This confirms that models possess latent reasoning capabilities but fail to deploy them spontaneously. Our framework establishes shared vocabulary between cognitive science and LLM research, enabling systematic diagnosis of reasoning failures and principled development of models that reason through robust cognitive mechanisms rather than spurious shortcuts.