Large Language Model (LLM) as an Excellent Reinforcement Learning Researcher in both Single-Agent and Multi-Agent Scenarios
Abstract
In the quantitative finance area, particularly in order execution, reinforcement learning (RL) has shown great promise due to its ability to interact with market environments based on real data. However, traditional RL methods suffer from slow research speed and rely on static market assumptions, which do not consider the impact of the agent's execution action on the environment. To address these, we propose a Self-Evolutional single-agent/multi-agent Reinforcement Learning (SE-RL) framework. The framework utilizes a Large Language Model (LLM) to design various RL algorithm modules, such as agent model design, reward function, profiling, communication, and state imagination, by leveraging the LLM generating module output or code. SE-RL could continuously improve the accuracy of LLM-generated RL algorithms through a dual-enhancement kit at both high-level (prompt refinement) and low-level (parameter fine-tuning). Additionally, we use a multi-agent system to simulate dynamic financial markets, accounting for the impact of order executions on market dynamics. To further enhance training in such a dynamic market, we develop a hybrid environment training method that could rebalance each environment's loss weight. Comprehensive experiments on 200 realistic stock datasets demonstrate that our proposed framework outperforms current state-of-the-art baselines. Project page and code: https://iclr2026-anonymous-workshop.github.io