Price of Quality: Sufficient Conditions for Sparse Recovery using Mixed-Quality Data
Youssef Chaabouni · David Gamarnik
Abstract
We study sparse recovery when observations come from mixed-quality sources: a small collection of high-quality measurements with small noise variance and a larger collection of lower-quality measurements with higher variance. For this heterogeneous-noise setting, we establish sample-size conditions for information-theoretic and algorithmic recovery. On the information-theoretic side, we show that $(n_1, n_2)$ must satisfy a linear trade-off defining the _Price of Quality_: the number of low-quality samples needed to replace one high-quality sample. In the agnostic setting, where the decoder is completely agnostic to the quality of the data, it is uniformly bounded, and in particular one high-quality sample is never worth more than two low-quality samples. In the informed setting, where the decoder is informed of per-sample variances, the price of quality can grow arbitrarily large. On the algorithmic side, we analyze the LASSO in the agnostic setting and show that the recovery threshold matches the homogeneous-noise case and only depends on the average noise level, revealing a striking robustness of computational recovery to data heterogeneity. Together, these results give the first conditions for sparse recovery with mixed-quality data and expose a fundamental difference between how the information-theoretic and algorithmic thresholds adapt to changes in data quality.
Successful Page Load