Poster
Strategist: Self-improvement of LLM Decision Making via Bi-Level Tree Search
Jonathan Light · Min Cai · Weiqin Chen · Guanzhi Wang · Xiusi Chen · Wei Cheng · Yisong Yue · Ziniu Hu
Hall 3 + Hall 2B #275
Traditional reinforcement learning and planning require a lot of data and training to develop effective strategies. On the other hand, large language models (LLMs) can generalize well and perform tasks without prior training but struggle with complex planning and decision-making. We introduce STRATEGIST, a new approach that combines the strengths of both methods. It uses LLMs to generate and update high-level strategies in text form, while a Monte Carlo Tree Search (MCTS) algorithm refines and executes them. STRATEGIST is a general framework that optimizes strategies through self-play simulations without requiring any training data. We test STRATEGIST in competitive, multi-turn games with partial information, such as Game of Pure Strategy (GOPS) and The Resistance: Avalon, a multi-agent hidden-identity discussion game. Our results show that STRATEGIST-based agents outperform traditional reinforcement learning models, other LLM-based methods, and existing LLM agents while achieving performance levels comparable to human players.
Live content is unavailable. Log in and register to view live content