Skip to yearly menu bar Skip to main content


Poster
in
Workshop: First Workshop on Representational Alignment (Re-Align)

Modality-Agnostic fMRI Decoding of Vision and Language

Mitja Nikolaus · Milad Mozafari · Nicholas Asher · Leila Reddy · Rufin VanRullen

Keywords: [ computational neuroscience ] [ decoding ] [ Multimodal ] [ fMRI ]


Abstract: Previous studies have shown that it is possible to map brain activation data of subjects viewing images onto the feature representation space of not only vision models (modality-specific decoding) but also language models (cross-modal decoding).In this work, we introduce and use a new large-scale fMRI dataset ($\sim8,500$ trials per subject) of people watching both images and text descriptions of such images. This novel dataset enables the development of modality-agnostic decoders: a single decoder that can predict which stimulus a subject is seeing, irrespective of the modality (image or text) in which the stimulus is presented. We train and evaluate such decoders to map brain signals onto stimulus representations from a large range of publicly available vision, language and multimodal (vision+language) models. Our findings reveal that (1) modality-agnostic decoders perform as well as (and sometimes even better than) modality-specific decoders (2) modality-agnostic decoders mapping brain data onto representations from unimodal models perform as well as decoders relying on multimodal representations (3) while language and low-level visual (occipital) brain regions are best at decoding text and image stimuli, respectively, high-level visual (temporal) regions perform well on both stimulus types.

Chat is not available.