Geometry-Preserving Coresets for Quantized Foundation Models in Remote Sensing
Abstract
We reveal a fundamental yet overlooked coupling in foundation model deployment: data selection and quantization cannot be optimized independently. Through comprehensive experiments on remote sensing classification under extreme constraints (5\% labeled data, INT8/binary quantization), we demonstrate that standard coreset selection strategies, while effective at full precision, suffer catastrophic accuracy collapse once models are quantized, with binary networks degrading to near-chance performance. This failure occurs because conventional methods prioritize decision uncertainty while ignoring representation geometry, which quantization fundamentally distorts. We introduce Entropy-Based Density-Weighted Coresets (EntropyBDWC), a geometry-aware selection strategy that explicitly preserves local embedding structure under discretization. Evaluated across three datasets, four architectures, and multiple precision regimes, EntropyBDWC consistently outperforms entropy-based and random sampling under INT8 quantization and substantially stabilizes binary networks. Critically, we show that performing selection in frozen foundation model embeddings (DINO) amplifies this robustness, establishing a new role for foundation models as data coresets rather than trainable backbones. Our work establishes that quantization-aware data curation is not optional but essential, with implications extending beyond remote sensing to any resource-constrained deployment of foundation models.