Poster

Faster Approximation of Probabilistic and Distributional Values via Least Squares

Weida Li · Yaoliang Yu

2024 Poster

Project Page [ Poster] [ OpenReview]

Abstract

The family of probabilistic values, axiomatically-grounded in cooperative game theory, has recently received much attention in data valuation. However, it is often computationally expensive to compute exactly (exponential w.r.t. the number of data to valuate denoted by $n$). The existing generic estimator costs $O(n^2\log n)$ utility evaluations to achieve an $(\epsilon,\delta)$-approximation under the 2-norm, while faster estimators have been developed recently for special cases (e.g., empirically for the Shapley value and theoretically for the Banzhaf value). In this work, starting from the discovered connection between probabilistic values and least square regressions, we propose a Generic Estimator based on Least Squares (GELS) along with its variants that cost $O(n\log n)$ utility evaluations for many probabilistic values, largely extending the scope of this currently best complexity bound. Moreover, we show that each distributional value, proposed by Ghorbani et al. (2020) to alleviate the inconsistency of probabilistic values induced by using distinct databases, can also be cast as optimizing a similar least square regression. This observation leads to a theoretically-grounded framework TrELS (Training Estimators based on Least Squares) that can train estimators towards the specified distributional values without requiring any supervised signals. Particularly, the trained estimators are capable of predicting the corresponding distributional values for unseen data, largely saving the budgets required for running Monte-Carlo methods otherwise. Our experiments verify the faster convergence of GELS, and demonstrate the effectiveness of TrELS in learning distributional values. Our code is available at https://github.com/watml/fastpvalue.

Video

Chat is not available.