Poster
in
Workshop: How Far Are We From AGI
How to benchmark AGI: the Adversarial Game
Emmanouil Seferis
Keywords: [ Artificial General Intelligence ] [ benchmarking ]
Recently, it has been observed that the performance of foundational models, especially in Natural Language Processing (NLP) and Computer Vision (CV), keeps increasing rapidly, and new emergent capabilities continue to appear with increasing scale. Some researchers claim that we could soon reach a point where future models are generally more capable than humans. Due to this possible scenario, and the critical safety risks involved, it’s paramount that we’re able to accurately access and measure the capabilities of future models. However, we find that the related terms of Artificial General Intelligence (AGI) and Artificial Super-Intelligence (ASI) are ill-defined, and no definite benchmarking process is proposed. Mitigating this gap is the aim of this work. Summarizing related literature, we propose precise definitions for AGI and ASI. Moreover, to tackle the benchmarking problem, we propose a new test, which we name “the Adversarial Game (AG)”. We show that AG is complete, in the sense that a system is AGI if and only if it can consistently win the AG against human players. Further, we further show that previous attempts to define AGI can be cast as special cases of AG. Finally, under some standard assumptions, a system’s performance in AG is readily measurable. Similarly, we propose related criteria for ASI. Overall, we hope that the proposed methodology can help the community towards better accessing the capabilities of future models.