Geometry-aware Coresets for Efficient Spatial Reasoning with Foundation Models in Low-Resource Remote Sensing
Abstract
Efficient spatial reasoning with foundation models is increasingly constrained by two interacting factors: limited labeled data and aggressive post-training compression for edge deployment. Existing pipelines treat data selection and model compression as independent problems. We show that this assumption is fundamentally flawed. Through extensive experiments on remote sensing classification, a canonical spatial reasoning task under extreme constraints, we demonstrate that standard coreset selection strategies, while effective at full precision, suffer severe and often catastrophic accuracy degradation under low-bit quantization and structured pruning. This failure arises from compression-induced distortion of embedding geometry: uncertainty-based coresets overemphasize decision boundaries while neglecting local relational structure, which is essential for stable reasoning under compression. We propose Entropy-Based Density-Weighted Coresets (EntropyBDWC), a geometry-aware data selection strategy that jointly models predictive uncertainty and local embedding density. Across multiple datasets, architectures, data budgets, and compression regimes, EntropyBDWC consistently improves robustness under INT8 quantization and stabilizes performance under extreme data scarcity. Crucially, performing coreset selection in frozen self-supervised foundation model embeddings (DINO) further amplifies compression robustness, revealing a new role for foundation models as geometry-preserving data selectors rather than trainable backbones. Our results establish compression-aware data curation as a necessary component of efficient spatial reasoning, with implications for edge AI, foundation model adaptation, and learning under systemic resource constraints.