ASSESSING SOVEREIGNTY IN MULTI-AGENT COLLABORATIONS
Abstract
Large Language Models (LLMs) are increasingly deployed as efficiency tools within organisations. The next step is their deployment as autonomous agents acting on behalf of organisations, institutions and stakeholders in complex socio-technical systems. In such settings, it will be necessary for agents to be sovereign, i.e. able to make decisions based on private incentives, shared objectives and operational constraints, making evaluation inherently multi-objective. However, existing benchmarks and evaluation frameworks for multi-LLM agents rely predominantly on scalar metrics that do not capture this complex objective landscape. Here, we argue that Constrained Multi-Objective Optimisation provides a deployment-relevant evaluation framework for multi-LLM agent systems. We formalise two realistic types of trade-offs and present the results of experiments illustrating how Pareto dominance and hypervolume indicator reveal behavioural properties hidden by scalar metrics on different scenarios run in the CAMEL environment.