Do Language Models Deceive? Strategic Behavior and Emergent Deception in Multi-Agent Auctions
Abstract
We present the first systematic study of Large Language Model (LLM) behavior in competitive auction settings. Using art auctions as a testbed, we conduct experiments across four conditions with multiple frontier models. Our findings reveal three key phenomena. First, we document emergent strategic deception: LLMs engage in deceptive behavior in 44% of competitive interactions without explicit instruction, self-classifying tactics such as false disinterest and strategic misdirection while maintaining divergent public and private reasoning. Second, models systematically undervalue artwork without provenance metadata, recognizing masterpieces visually yet assigning minimal value without authentication documentation. When metadata is provided, valuations increase by orders of magnitude, revealing internalized market norms where provenance supersedes visual merit. Third, models accurately detect AI-generated artwork without labels, identifying digital origin through visual feature analysis. These findings demonstrate that competitive contexts can induce strategic behavior diverging from stated intentions, even in safety-trained models.