Poster
in
Workshop: ICLR 2026 Workshop on Memory for LLM-Based Agentic Systems (MemAgents) Mon, Apr 27, 2026 • 5:50 AM – 6:35 AM PDT

ShiftBench: Measuring Recovery of Agent Memory Under Distribution Shift

Teresa Zhang

Project Page [ OpenReview]

Abstract

Selecting memory policies by long-horizon accuracy can be misleading under shift, because rankings may reverse when evaluated by post-shift recovery. We introduce ShiftBench, a lightweight protocol defining shift segments and Recovery@T on LoCoMo and HaluMem-Long. On LoCoMo, lexical baselines (TF--IDF methods) show reversal under interruption (Spearman $\rho=-0.30$, inversion $0.60$), and alignment drops from $0.94$ to $0.70$ ($\Delta \rho=0.24$, 95\% CI $[0.12, 0.37]$). On HaluMem-Long, reversal is smaller but still present ($\rho=0.02$, inversion $0.50$). Overall, ShiftBench shows that post-shift recovery is a distinct evaluation axis that can change memory-policy selection.

Chat is not available.