LOOK BEFORE YOU LEAP: THERMODYNAMIC ARBITRATION OF PARAMETRIC AND NON-PARAMETRIC KNOWLEDGE IN LLM AGENTS VIA SELF-REGULATING MEMORY ARCHITECTURES
Abstract
While Large Language Models (LLMs) possess rich implicit reasoning capabilities in their residual streams, current agentic architectures demand that such continuous latent states collapse prematurely to explicit retrieval operations. We argue that this ”Open-Loop” reliance on external memory bypasses the model’s intrinsic capacity to evaluate its own epistemic boundaries. In this paper, we propose MARTA (Metacognitive Adaptive Retrieval and Thought Architecture), which operationalizes the concept of ”Latent Metacognition” by extracting a thermodynamic signature directly from the hidden states of the frozen model. Rather than treating retrieval as a mandatory discrete action, MARTA defines it as a function of the implicit entropy landscape. We define an introspective latent vector u(hL) ∈ Rd that encodes the ”Epistemic Energy” of the current thought. Non-parametric memory access is gated by a decision boundary in this latent space. This enables the agent to engage in ”Implicit Thinking”—to evaluate the necessity of external help within its own residual stream—before committing to the computational cost of retrieval. Our experiments show that this latent regulation allows models to ”Look (internally) Before They Leap (externally),” to reject 87.6% adversarial distractors and reduce token overhead by 38.4% without sacrificing reasoning fidelity