From Measurement to Meaning: A Validity-Centered Approach to AI Evaluation
Sanmi Koyejo
2025 Invited Talk
in
Workshop: ICLR 2025 Workshop on Human-AI Coevolution
in
Workshop: ICLR 2025 Workshop on Human-AI Coevolution
Abstract
While the capabilities and utility of AI systems have advanced, rigorous evaluation norms have lagged. Grand claims, such as models achieving general intelligence, are often evaluated with narrow benchmarks, like performance on graduate-level exam questions, which provide a limited and potentially misleading assessment. We provide a structured approach to reasoning about the types of evidence required for an evaluation to sufficiently support a claim. Our framework emphasizes a claim-first and a measurement and evaluation-first approach, where various stakeholders provide measurements and evaluations, and users aim to validate claims and decisions by using this measurement to evaluate as a system.
Video
Chat is not available.
Successful Page Load