Learning What to Learn: Curriculum Curation for Test-Time Agent Learning
Qizheng Zhang ⋅ Sherry Ruan ⋅ Shubhangi Upasani ⋅ Fenglu Hong ⋅ Changxiu Ji ⋅ Changran Hu ⋅ Bo Li ⋅ Hanchen Li ⋅ Kunle Olukotun
Abstract
Test-time learning enables large language model (LLM) agents to adapt during inference without costly retraining, yet prior work largely treats test-time experience as equally useful. We ask a simple question: *what data should agents learn from at test time?* Focusing on task selection and ordering for context-based adaptation, we hypothesize that redundant or overly simple examples offer diminishing returns, while curated curricula improve sample efficiency. Using the Agentic Context Engineering (ACE) framework, we evaluate on the AppWorld benchmark featuring tool-use and coding agents. We show that careful data selection can match full-dataset performance using only $\sim$30\% of training tasks, and that task ordering measurably affects learning outcomes. Our results position curriculum curation as a first-class design dimension for efficient test-time agent learning and practical deployment.
Successful Page Load