CATTLE TRADE: A MULTI-AGENT BENCHMARK FOR LLM BLUFFING, BIDDING, AND NEGOTIATION
Abstract
Standard benchmarks evaluate LLM knowledge and single-agent reasoning, but miss the capabilities required for real-world strategic interaction: bluffing, negoti- ation, and resource management on a long term basis. Existing game benchmarks isolate individual skills, such as deception in Werewolf or bidding in simple auc- tions, rather than requiring their integrated deployment. We introduce CATTLE TRADE, a benchmark based on the card game Kuhhandel1 that integrates com- petitive auctions, hidden-information trades, and deceptive offers within 50–60 turn games. We evaluate 6 frontier LLMs across 33 games and find that strategic commitment, measured through offer values in trades and buy-right exercise rates, strongly predicts success, while pure bluffing strategies underperform.