Skip to yearly menu bar Skip to main content


Invited Talk
in
Workshop: ICLR 2025 Workshop on Human-AI Coevolution

From Measurement to Meaning: A Validity-Centered Approach to AI Evaluation

Sanmi Koyejo

[ ]
2025 Invited Talk
in
Workshop: ICLR 2025 Workshop on Human-AI Coevolution

Abstract:

While the capabilities and utility of AI systems have advanced, rigorous evaluation norms have lagged. Grand claims, such as models achieving general intelligence, are often evaluated with narrow benchmarks, like performance on graduate-level exam questions, which provide a limited and potentially misleading assessment. We provide a structured approach to reasoning about the types of evidence required for an evaluation to sufficiently support a claim. Our framework emphasizes a claim-first and a measurement and evaluation-first approach, where various stakeholders provide measurements and evaluations, and users aim to validate claims and decisions by using this measurement to evaluate as a system.

Chat is not available.