Grounding Long-Horizon Agent Coordination in GUI Environments via Contract-based Structural Planning
Hao Yu ⋅ Weiming Li ⋅ YUEMING LYU ⋅ Jie-Jing Shao ⋅ Yulei Sui ⋅ Ivor Tsang ⋅ Haiyan Yin
Abstract
Long-horizon multi-agent GUI automation is critically bottlenecked by context drift, a progressive divergence where an agent’s internal plan gradually diverges from the actual environment state due to stochastic execution outcomes. Crucially, this failure is not primarily due to insufficient reasoning capacity, but to a representational mismatch in which current agentic systems encode plans as transient textual histories that cannot absorb execution evidence without discarding valid prior assumptions. We introduce G-Weaver, a framework that grounds multi-agent coordination in a shared plan and execution evidence substrate. In this paradigm, plans are decomposed into subgoals that are formalized as atomic semantic contracts with verifiable pre- and post-conditions, transforming planning from the generation of disposable action sequences into the maintenance of persistent structural commitments that absorb execution feedback without necessitating global replanning. At the core of G-Weaver is a Structural Weaving mechanism that integrates execution evidence through localized structural plan revisions that preserve plan identity. This design supports constructive monotonic plan evolution where verified progress is not discarded, failures are attributed to specific violated assumptions, and recovery proceeds without the variance of global replanning. Experiments on the OSWorld benchmark demonstrate that G-Weaver can achieve performance comparable to or exceeding strong multi-agent baselines such as CoAct-1 using lightweight language models, while significantly reducing inference cost by up to $35\times$.
Chat is not available.
Successful Page Load