Self-Adapting Agents for Automating Research Coding Workflows
Abstract
Existing prompt‑optimization techniques use only local signals to update behavior, neglect broader, recurring patterns across tasks, leading to poor generalization; they often rely on full‑prompt rewrites or unstructured merges, causing knowledge loss. These limitations are magnified in research‑coding workflows, characterized by heterogeneous repositories, underspecified environments, and weak feedback; where reproducing results from public codebases is an established evaluation regime. We introduce Self‑Adapting Research Engineer (SARE), a framework that learns from Global Training Context, cross‑repository execution trajectories recognizes recurring failure modes, distills them into reusable heuristics, and performs targeted edits over configurable fields: the system prompt, a task‑prompt template, and a cumulative cheatsheet preserving validated instructions while incrementally adding strategies. SARE, via this reflective prompt‑optimization framework, improves performance over prior state‑of‑the‑art human performance by 23.6% on SUPER, 3.5% on ResearchCodeBench and 7.1% on ScienceAgentBench across respective metrics, surpassing prior prompt‑optimization technique.