Poster
in
Workshop: 4th ICLR Workshop on Machine Learning for Remote Sensing Mon, Apr 27, 2026 • 11:00 AM – 12:00 PM PDT

When Does Embedding Arithmetic Fail? A Systematic Analysis in Remote Sensing Vision-Language Models

Jinpyo Hong ⋅ Le Yu

Project Page [ OpenReview]

Abstract

Embedding arithmetic promises flexible compositional queries over remote sensing imagery---transforming a harbor into an airport by subtracting "water" and adding "runway"---yet when this actually works remains poorly understood. We systematically evaluate four CLIP-based models across five RS datasets and identify concept entanglement as the dominant failure mode (40--60\% of failures): semantically related concepts occupy overlapping embedding subspaces that confound arithmetic. We propose a pre-hoc entanglement metric---requiring only text embeddings---that predicts failure with AUC up to 0.818, with GeoRSCLIP showing the most consistent predictions (mean AUC=0.675). Notably, embedding geometry does not reliably predict compositional capability ($r$=0.30, $p$=0.20), suggesting discriminative and compositional reasoning require different representational properties. We provide practical guidelines: arithmetic succeeds for well-separated concepts (88\%) but fails predictably for structurally similar classes (42\%).

Chat is not available.