Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Second Workshop on Representational Alignment (Re$^2$-Align)

Partial Alignment of Representations via Interventional Consistency

Felix Leeb · Satoshi Hayakawa · Yuhta Takida · Yuki Mitsufuji


Abstract:

Multimodal representation learning aims to integrate diverse data modalities into a shared embedding space with a common approach to use contrastive learning. However, this approach is limited by the need for large amounts of paired data, sensitivity to data quality, and lack of scalability when introducing new modalities. We propose Interventional Consistency (ICon), a novel framework for learning structured representations that achieve partial alignment across modalities using unpaired annotated samples. The key is to align the annotation-specific information in the latent space by enforcing the consistency of controllable and recognizable semantic interventions across modalities. We demonstrate that our method is able to align representations sufficiently to achieve competitive results on a novel retrieval task we introduce called label-retrieval. Furthermore, when pre-training a model with ICon, and then fine-tuning it with a small amount of paired data using CLIP, we achieve comparable retrieval performance with 2-4x fewer samples, thereby alleviating the need for paired data to learn multi-modal representations.

Chat is not available.