Poster
in
Workshop: Workshop on Multi-Agent Learning and Its Opportunities in the Era of Generative AI Mon, Apr 27, 2026 • 12:00 PM – 12:20 PM PDT

CATTLE TRADE: A MULTI-AGENT BENCHMARK FOR LLM BLUFFING, BIDDING, AND NEGOTIATION

Robert Müller

Abstract

Standard benchmarks evaluate LLM knowledge and single-agent reasoning, but miss the capabilities required for real-world strategic interaction: bluffing, negoti- ation, and resource management on a long term basis. Existing game benchmarks isolate individual skills, such as deception in Werewolf or bidding in simple auc- tions, rather than requiring their integrated deployment. We introduce CATTLE TRADE, a benchmark based on the card game Kuhhandel1 that integrates com- petitive auctions, hidden-information trades, and deceptive offers within 50–60 turn games. We evaluate 6 frontier LLMs across 33 games and find that strategic commitment, measured through offer values in trades and buy-right exercise rates, strongly predicts success, while pure bluffing strategies underperform.

Chat is not available.