Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Navigating and Addressing Data Problems for Foundation Models (DPFM)

Don't Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget

Florian Eddie Dorner · Moritz Hardt

Keywords: [ evaluation ] [ label noise ] [ data quality ] [ benchmarking ]


Abstract:

We study how to best spend a budget of noisy labels to compare the accuracy of two binary classifiers. It’s common practice to collect and aggregate multiple noisy labels for a given data point into a less noisy label via a majority vote. We prove a theorem that runs counter to conventional wisdom. If the goal is to identify the better of two classifiers, we show it’s best to spend the budget on collecting a single label for more samples. We discuss the implications of our work for the design of machine learning benchmarks, where they overturn some time-honored recommendations.

Chat is not available.