Skip to yearly menu bar Skip to main content


Oral
in
Workshop: Machine Learning for Remote Sensing (ML4RS)

Evaluating Tool-Augmented Agents in Remote Sensing Platforms

Simranjit Singh · Michael Fore · Dimitrios Stamoulis


Abstract:

Tool-augmented Large Language Models (LLMs) have shown impressive capabilities in remote sensing (RS) applications. However, existing benchmarks assume question-answering input templates over predefined image-text data pairs. These standalone instructions neglect the intricacies of realistic user grounded tasks. Consider a geospatial analyst: they zoom in a map area, they draw a region over which to collect satellite imagery, and they succinctly ask `Detect all objects here''. Where ishere`, if it is not explicitly hardcoded in the image-text template, but instead is implied by the system state, e.g., live map positioning? To bridge this gap, we present Geo-ToolQA, a benchmark designed to capture long sequences of verbal, visual, and click-based actions on a real UI platform. Through in-depth evaluation of state-of-the-art LLMs over a diverse set of 500 tasks, we offer insights towards stronger agents for RS applications.

Chat is not available.