Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Pitfalls of limited data and computation for Trustworthy ML

Fairness-Aware Data Valuation for Supervised Learning

José Pombal · Pedro Saleiro · Mario Figueiredo · Pedro Bizarro


Abstract: Data valuation is an ML field that studies the value of training instances towards a given predictive task. Although data bias is one of the main sources of downstream model unfairness, previous work in data valuation does not consider how training instances may influence both performance and fairness of ML models. Thus, we propose $\textbf{F}$airness-$\textbf{A}$ware $\textbf{D}$ata Valuati$\textbf{O}$n (FADO), a data valuation framework that can be used to incorporate fairness concerns into a series of ML tasks (e.g., data pre-processing, exploratory data analysis, active learning). We propose an entropy-based data valuation metric suited to address our two-pronged goal of maximizing both performance and fairness, which is more computationally efficient than existing metrics. We then show how FADO can be applied as the basis for unfairness mitigation pre-processing techniques. Our methods achieve promising results — up to a 40 p.p. improvement in fairness at a less than 1 p.p. loss in performance compared to a baseline — and promote fairness in a data-centric way, where a deeper understanding of data quality takes center stage.

Chat is not available.