Evaluating LLM Agents as Human Simulators in Climate Social Dilemmas
Abstract
Understanding how humans and institutions behave in climate-related social dilemmas is critical for designing effective climate policy, yet standard agent-based models often rely on simplified decision rules or fully rational agents and therefore struggle to capture bounded rationality, heterogeneity, and communication. We evaluate large language model (LLM) agents as behaviorally grounded simulators of companies and investors in a continuous-action climate-finance dilemma built on the InvestESG platform, and benchmark them against fully rational profit-driven reinforcement learning (RL) agents, a centralized social planner, and human participants in the same game. Our results show that LLM-based simulations can be powerful tools for analyzing social-good-oriented policymaking. LLM agents naturally reproduce human-like cooperative tendencies, flexibly support heterogeneous behavior, and exhibit emergent coordination and even collusion when communication is introduced, phenomena that are difficult to capture with conventional modeling approaches. At the same time, these simulations can be fragile: LLM behavior is sensitive to contextual framing and often requires explicit numerical scaffolding to ensure reliable reasoning.