Agent That Matters: An Attribution Framework for Multi-Agent LLMs
Abstract
Quantifying individual agent contributions in LLM-based multi-agent systems (MAS) is critical for optimizing architectural efficiency and mitigating functional redundancy. We therefore introduce a game-theoretic attribution framework that formalizes MAS collaboration as a cooperative game, enabling the decomposition of system utility into principled contribution scores. Through the empirical evaluation of MetaGPT on the HumanEval and MBPP benchmarks, we identify the Product Manager as a dominant veto player, while the QA Engineer exhibits negligible or negative marginal impact. Our results show that in hierarchical MAS, Leave-One-Out (LOO) attribution serves as a reliable proxy for more complex axiomatic estimators like Shapley and Banzhaf values. Finally, we demonstrate that in-context ablation via introspective removal fails as a faithful substitute for exact removal and increases token usage by a significant amount. Ultimately, these findings demonstrate that framing MAS as a cooperative game is a promising direction for credit assignment, providing a rigorous foundation for diagnosing architectural bottlenecks and optimizing resource allocation in complex multi-agent workflows.