From Interaction to Abstraction: Using Behavior and Brain Activity to Evaluate How AI Systems Learn Games
Abstract
Humans rapidly learn abstract knowledge when encountering novel environments and flexibly deploy this knowledge to guide efficient and intelligent action. Can modern AI systems learn and plan in a similar way? Leveraging a unique dataset of human gameplay with concurrent fMRI recordings, we compare model-free and model-based reinforcement learning agents, bayes-optimal model-based agents, and frontier Large Reasoning Model. We evaluate models on task performance and behavioral alignment, then use brain activity as a benchmark to probe the internal representations these models construct. Specifically, using encoding models, we assess how well each system’s internal representations predict brain activity in regions previously implicated in theory-based reasoning. We find that the Large Reasoning Model most closely matches human behavioral patterns during game discovery and predicts brain activity in theory-coding regions an order of magnitude better than both model-free and model-based alternatives. Our results shed light on the computational principles underlying human-like rapid learning and planning.