ICLR 2022
Skip to yearly menu bar Skip to main content


Wiki-M3L: Wikipedia and Multimodal & Multilingual Research

Miriam Redi · Yannis Kalantidis · Krishna Srinivasan · Yacine Jernite · Tiziano Piccardi · Diane Larlus · Stéphane Clinchant · Lucie-Aimée Kaffee

In the broader AI research community, Wikipedia data has been utilized as part of the training datasets for (multilingual) language models like BERT for many years. However, its content is still a largely untapped resource for vision and multimodal learning systems.Aside from a few recent cases, most vision and language efforts either work on narrow domains and small vocabularies and/or are available for English only, thus limiting the diversity of perspectives and audiences incorporated by these technologies. Recently, we see methods leveraging large data for multi-modal pretraining, and Wikipedia is one of the few open resources central to that effort.With this workshop, we propose to offer a space to bring together the community of vision, language and multilingual learning researchers, as well as members of the Wikimedia community, to discuss how these two groups can help and support each other. We will explore existing aspects and new frontiers of multilingual understanding of vision and language, focusing on the unique nature of Wikimedia’s mission: to bring free knowledge to the whole world equally.Beside invited talks and panel discussions, our workshop will present the winning entries of an ongoing Wikimedia-led, large-scale challenge on multilingual, multimodal image-text retrieval. Using the publicly available Wikipedia-based ImageText (WIT) dataset which contains 37 Million image-text sets across 108 languages, we will be presenting the benchmark and the top methods along a disaggregated set of performance, fairness, and efficiency metrics.

Chat is not available.
Timezone: America/Los_Angeles