Dual-Scale World Memory for LLM Agents towards Hard-Exploration Problems
Abstract
LLM-based agents have seen promising advances, yet are still limited in hard-exploration tasks which require agents to perform sustained exploration under sparse feedback. We present GLoW, a novel approach leveraging a dual-scale textual world memory, maintaining a trajectory frontier of high-value discoveries at the global scale, while learning from local trial-and-error in exploration through a Multi-path Advantage Reflection mechanism which infers advantage-based progress signals to guide exploration. To evaluate our framework for hard-exploration, we tackle the Jericho benchmark suite of text-based games, where GLoW achieves a new state-of-the-art performance for LLM-based approaches. Compared to state-of-the-art RL-based methods, our approach achieves comparable performance while requiring 100-800× fewer environment interactions. When scaled to stronger LLMs, GLoW surpasses all prior methods on 4 out of 6 difficult and extreme Jericho games.