Decoupling Reasoning from Action: Architectural Impacts on Agentic Planning Consistency
Abstract
Large Language Models (LLMs) serving as autonomous agents often conflate reasoning (planning) with action (tool execution) within a single context window. We investigate how context architecture impacts logical consistency and planning effectiveness through six ablation studies using MCPBench with GPT-4o, evaluating monolithic, partially partitioned, and fully role-separated architectures under 6-25 concurrent distractors. We find that monolithic architectures maximize raw efficiency and fine-grained accuracy (Dependency Awareness 5.95 vs. 5.67; Parameter Accuracy 8.08 vs. 7.12), while role-separated architectures offer superior scaling stability (1.38x vs. 3.24x latency degradation). These findings reveal a "context-reasoning trade-off" that challenges the assumption that modular agents are universally superior.