ICLR Poster Multi-modal Learning: A Look Back and the Road Ahead

Poster

Multi-modal Learning: A Look Back and the Road Ahead

Divyam Madaan · Sumit Chopra · Kyunghyun Cho

Hall 3 + Hall 2B #550

[ Abstract ]

Fri 25 Apr midnight PDT — 2:30 a.m. PDT

Abstract:

Advancements in language models has spurred an increasing interest in multi-modal AI — models that process and understand information across multiple forms of data, such as text, images and audio. While the goal is to emulate human-like ability to handle diverse information, a key question is: do human-defined modalities align with machine perception? If not, how does this misalignment affect AI performance? In this blog, we examine these questions by reflecting on the progress made by the community in developing multi-modal benchmarks and architectures, highlighting their limitations. By reevaluating our definitions and assumptions, we propose ways to better handle multi-modal data by building models that analyze and combine modality contributions both independently and jointly with other modalities.

Live content is unavailable. Log in and register to view live content