Poster Thu, Apr 23, 2026 • 11:15 AM – 1:45 PM PDT

Go Beyond Earth: Understanding Human Actions and Scenes in Microgravity Environments

Di Wen · Lei Qi · Kunyu Peng · Kailun Yang · Fei Teng · Ao Luo · Jia Fu · Yufan Chen · Ruiping Liu · Yitian Shi · M. Sarfraz · Rainer Stiefelhagen

Abstract

Despite substantial progress in video understanding, most existing datasets are limited to Earth’s gravitational conditions. However, microgravity alters human motion, interactions, and visual semantics, revealing a critical gap for real-world vision systems. This presents a challenge for domain-robust video understanding in safety-critical space applications. To address this, we introduce MicroG-4M, the first benchmark for spatio-temporal and semantic understanding of human activities in microgravity. Constructed from real-world space missions and cinematic simulations, the dataset includes 4,759 clips covering 50 actions, 1,238 context-rich captions, and over 7,000 question–answer pairs on astronaut activities and scene understanding. MicroG-4M aims to support three core tasks: fine-grained multi-label action recognition, temporal video captioning, and visual question answering, thereby enabling a comprehensive evaluation of both spatial localization and semantic reasoning in microgravity contexts. We establish baselines using state-of-the-art models. All data, annotations, and code will be made publicly available upon decision.