Poster
in
Workshop: Agents in the Wild: Safety, Security, and Beyond

LOOK BEFORE YOU LEAP: THERMODYNAMIC ARBI- TRATION OF PARAMETRIC AND NON-PARAMETRIC KNOWLEDGE IN LLM AGENTS VIA SELF- REGULATING MEMORY ARCHITECTURES

Akash Das

Project Page [ OpenReview]

Abstract

As Large Language Model (LLM) agents are deployed in ”wild” environments, they face the critical threat of Context Poisoning—where irrelevant or adversarial retrieval results induce hallucinations and derail reasoning. Current ”Open-Loop” agents, which retrieve indiscriminately (P (act) ≈ 1), lack the immune system to reject these toxic inputs. In this work, we introduce a safety-critical control layer MARTA (Metacognitive Adaptive Retrieval and Thought Architecture), which establishes a Thermodynamic Firewall between the agent and its memory. We model the decision to ingest external context not as a default behavior, but as a risk-aware arbitration based on the frozen backbone’s “Epistemic Signature” u(x). We introduce the Discriminative Cliff metric, which quantifies an agent’s ability to distinguish between high-similarity distractors and high-utility evidence. Our evaluation on the Adversarial Alignment Protocol demonstrates that MARTA achieves a discriminative cliff of +87.4, rejecting 87.6% of adversarial traps that successfully poisoned baseline agents. By forcing the model to ”Look (at its own uncertainty) Before It Leaps (into external data),” MARTA provides the necessary epistemic regulation for robust, safe, and reliable autonomous operation.

Chat is not available.