Poster Thu, Apr 23, 2026 • 6:30 AM – 9:00 AM PDT Pavilion 4 P4-#3301

Culture In a Frame: C$^3$B as a Comic-Based Benchmark for Multimodal Culturally Awareness

Yuchen Song ⋅ Andong Chen ⋅ Wenxin Zhu ⋅ Kehai Chen ⋅ Xuefeng Bai ⋅ Muyun Yang ⋅ Tiejun Zhao

Project Page [ Poster] [ OpenReview]

Abstract

Cultural awareness capabilities have emerged as a critical capability for Multimodal Large Language Models (MLLMs). However, current benchmarks lack progressed difficulty in their task design and are deficient in cross-lingual tasks. Moreover, current benchmarks often use real-world images. Each real-world image typically contains one culture, making these benchmarks relatively easy for MLLMs. Based on this, we propose C$^3$B (\textbf{C}omics \textbf{C}ross-\textbf{C}ultural \textbf{B}enchmark), a novel multicultural, multitask and multilingual cultural awareness capabilities benchmark. C$^3$B comprises over 2000 images and over 18000 QA pairs, constructed on three tasks with progressed difficulties, from basic visual recognition to higher-level cultural conflict understanding, and finally to cultural content generation. We conducted evaluations on 11 open-source MLLMs, revealing a significant performance gap between MLLMs and human performance. The gap demonstrates that C$^3$B poses substantial challenges for current MLLMs, encouraging future research to advance the cultural awareness capabilities of MLLMs.

Video

Chat is not available.