Workshop
Wiki-M3L: Wikipedia and Multimodal & Multilingual Research
Miriam Redi · Yannis Kalantidis · Krishna Srinivasan · Yacine Jernite · Tiziano Piccardi · Diane Larlus · Stéphane Clinchant · Lucie-Aimée Kaffee
Fri 29 Apr, 3 a.m. PDT
In the broader AI research community, Wikipedia data has been utilized as part of the training datasets for (multilingual) language models like BERT for many years. However, its content is still a largely untapped resource for vision and multimodal learning systems.Aside from a few recent cases, most vision and language efforts either work on narrow domains and small vocabularies and/or are available for English only, thus limiting the diversity of perspectives and audiences incorporated by these technologies. Recently, we see methods leveraging large data for multi-modal pretraining, and Wikipedia is one of the few open resources central to that effort.With this workshop, we propose to offer a space to bring together the community of vision, language and multilingual learning researchers, as well as members of the Wikimedia community, to discuss how these two groups can help and support each other. We will explore existing aspects and new frontiers of multilingual understanding of vision and language, focusing on the unique nature of Wikimedia’s mission: to bring free knowledge to the whole world equally.Beside invited talks and panel discussions, our workshop will present the winning entries of an ongoing Wikimedia-led, large-scale challenge on multilingual, multimodal image-text retrieval. Using the publicly available Wikipedia-based ImageText (WIT) dataset which contains 37 Million image-text sets across 108 languages, we will be presenting the benchmark and the top methods along a disaggregated set of performance, fairness, and efficiency metrics.
Schedule
Fri 3:00 a.m. - 3:20 a.m.
|
Opening Remarks
(
Opening remarks
)
>
|
🔗 |
Fri 3:20 a.m. - 4:20 a.m.
|
Session: Open Data
(
Keynotes and QA
)
>
|
Omar Sanseviero · Hady Elsahar 🔗 |
Fri 4:20 a.m. - 5:20 a.m.
|
Session: Multimodality and Multilinguality - 1
(
Keynotes and QA
)
>
|
Lucia Specia · Preethi Jyothi 🔗 |
Fri 5:20 a.m. - 6:00 a.m.
|
Ask a Wikipedian & Poster Session
(
posters and breakout rooms
)
>
|
Isaac Johnson · Emily Lescak · Byungsoo Ko · Geonmo Gu · Nicola Messina · Davide Alessandro Coccomini · Fabrizio Falchi · Andrea Esuli 🔗 |
Fri 5:20 a.m. - 6:00 a.m.
|
Ask a Wikipedian group 2
(
posters
)
>
|
🔗 |
Fri 5:20 a.m. - 6:00 a.m.
|
Ask a Wikipedian group 3
(
posters
)
>
|
🔗 |
Fri 6:00 a.m. - 7:00 a.m.
|
Session: Wikimedia and the community
(
Keynotes and QA
)
>
|
Leila Zia · Andrew Lih · Caroline Becker 🔗 |
Fri 7:00 a.m. - 8:00 a.m.
|
Panel: Multilinguality in multimodal research and open data
(
Panel
)
>
|
🔗 |
Fri 8:00 a.m. - 9:00 a.m.
|
Panel: How can Wikimedia and CV/ML communities learn from each other?
(
Panel
)
>
|
🔗 |
Fri 9:00 a.m. - 10:00 a.m.
|
Session: Wikipedia Image/Caption Matching Competition
(
Live presentations and Q&A
)
>
|
Miriam Redi · Krishna Srinivasan · Zhao He · Peng Lu · miaou miaou · Fabrizio Falchi · Nicola Messina · Andrea Esuli · Davide Alessandro Coccomini 🔗 |
Fri 10:00 a.m. - 10:10 a.m.
|
Multimodality and large-scale vision
(
Talk
)
>
SlidesLive Video |
Tom Duerig 🔗 |
Fri 10:10 a.m. - 10:20 a.m.
|
Florence-VL overview
(
Talk
)
>
SlidesLive Video |
Lijuan Wang 🔗 |
Fri 10:20 a.m. - 10:30 a.m.
|
Multitask and Reliable Vision and Language Models
(
Talk
)
>
|
Marcus Rohrbach 🔗 |
Fri 10:30 a.m. - 11:00 a.m.
|
Secrets of large-scale vision and language model pre-training
(
Panel
)
>
|
Tom Duerig · Lijuan Wang · Marcus Rohrbach 🔗 |
Fri 11:00 a.m. - 12:00 p.m.
|
Biases in AI an indeginous data sovereignty
(
Keynotes and QA
)
>
|
Michael Running Wolf · Margaret Mitchell 🔗 |
Fri 12:00 p.m. - 12:30 p.m.
|
Session: Multimodality and Multilinguality - 2
(
Keynotes and Q&A
)
>
|
Jason Baldridge 🔗 |
Fri 12:30 p.m. - 12:35 p.m.
|
Transformer-Based Multi-modal Proposal and Re-Rank for Wikipedia Image-Caption Matching
(
Oral
)
>
SlidesLive Video |
Nicola Messina · Davide Alessandro Coccomini · Fabrizio Falchi · Andrea Esuli 🔗 |
Fri 12:35 p.m. - 12:40 p.m.
|
Large-scale Bilingual Language-Image Contrastive Learning
(
Oral
)
>
SlidesLive Video |
Byungsoo Ko · Geonmo Gu 🔗 |
Fri 12:40 p.m. - 12:45 p.m.
|
Considerations for Multilingual Wikipedia Research
(
Oral
)
>
|
Isaac Johnson · Emily Lescak 🔗 |
Fri 12:45 p.m. - 1:00 p.m.
|
Papers Q&A
(
Q&A
)
>
|
🔗 |
Fri 1:00 p.m. - 1:20 p.m.
|
Closing remarks
(
Closing remarks
)
>
|
🔗 |