Correcting Geospatial Data Displacement with Foundation Vision Models
Abstract
Geospatial point annotations collected during field surveys often suffer from positional displacement due to GPS inaccuracy and environmental constraints, limiting their utility for downstream applications. Traditional alignment methods rely on multi-temporal imagery or task-specific training, restricting their practical applicability. We propose a simple preprocessing pipeline that leverages foundation vision models to correct displaced annotations through semantic similarity matching before downstream analysis or model training. A semantic reference is constructed from a small set of annotated examples of the target class, and for each displaced point, we define a search region to identify the location with highest similarity to the reference set using embeddings from a feature extractor. We evaluate our method on a forestry dataset from the Amazon rainforest containing annotations for over 50 tree species. Linear probing experiments demonstrate that models trained on corrected annotations outperform those trained on original displaced data, and qualitative analysis shows that corrections consistently move points from background regions toward the target class. By requiring only a small set of reference examples and no retraining, our method provides a practical preprocessing step for improving geospatial annotation quality in field-based surveys.