Modality-Agnostic fMRI Decoding of Vision and Language

Mitja Nikolaus,Milad Mozafari,Nicholas Asher,Leila Reddy,Rufin VanRullen

2024-03-18

Abstract:Previous studies have shown that it is possible to map brain activation data of subjects viewing images onto the feature representation space of not only vision models (modality-specific decoding) but also language models (cross-modal decoding). In this work, we introduce and use a new large-scale fMRI dataset (~8,500 trials per subject) of people watching both images and text descriptions of such images. This novel dataset enables the development of modality-agnostic decoders: a single decoder that can predict which stimulus a subject is seeing, irrespective of the modality (image or text) in which the stimulus is presented. We train and evaluate such decoders to map brain signals onto stimulus representations from a large range of publicly available vision, language and multimodal (vision+language) models. Our findings reveal that (1) modality-agnostic decoders perform as well as (and sometimes even better than) modality-specific decoders (2) modality-agnostic decoders mapping brain data onto representations from unimodal models perform as well as decoders relying on multimodal representations (3) while language and low-level visual (occipital) brain regions are best at decoding text and image stimuli, respectively, high-level visual (temporal) regions perform well on both stimulus types.

Computer Vision and Pattern Recognition,Computation and Language

What problem does this paper attempt to address?

The paper attempts to address the problem of developing a model capable of cross-modal decoding of brain activity, i.e., a single decoder that can predict the stimuli the subject is viewing (whether images or text descriptions) without prior knowledge of the specific modality of the stimuli. Specifically, the researchers utilized a new large-scale fMRI dataset that includes brain activation data of subjects while viewing images and corresponding text descriptions. By training and evaluating this cross-modal decoder, the researchers aim to explore the following points: 1. **Performance of the Cross-Modal Decoder**: Verify whether the cross-modal decoder can achieve similar or even better performance compared to modality-specific decoders (e.g., decoders targeting only vision or language). 2. **Multimodal vs. Unimodal Representations**: Compare the decoding effectiveness based on features from unimodal models (vision or language models) and multimodal models (models combining vision and language) to determine which type of model features are more beneficial for cross-modal decoding. 3. **Decoding Ability of Different Brain Regions**: Analyze the performance of different brain regions (such as low-level visual areas, high-level visual areas, and language-related areas) in decoding image and text stimuli, and explore which brain regions perform better in cross-modal decoding. Through these studies, the paper aims to advance the understanding of how the brain processes and integrates information from different modalities and provide new tools and methods for future cross-modal neuroscience research.

Modality-Agnostic fMRI Decoding of Vision and Language

Decoding Visual Experience and Mapping Semantics through Whole-Brain Analysis Using fMRI Foundation Models

Decoding Visual Neural Representations by Multimodal Learning of Brain-Visual-Linguistic Features

From Sight to Insight: A Multi-task Approach with the Visual Language Decoding Model

MindGPT: Interpreting What You See with Non-invasive Brain Recordings

Reading visually embodied meaning from the brain: Visually grounded computational models decode visual-object mental imagery induced by written text

Looking through the mind's eye via multimodal encoder-decoder networks

Neuro-Vision to Language: Enhancing Brain Recording-based Visual Reconstruction and Language Interaction

‘When’ and ‘what’ did you see? A novel fMRI-based visual decoding framework

Brain decoding: toward real-time reconstruction of visual perception

Revealing Vision-Language Integration in the Brain with Multimodal Networks

A multimodal LLM for the non-invasive decoding of spoken text from brain recordings

BrainChat: Decoding Semantic Information from fMRI using Vision-language Pretrained Models

MEG Evidence That Modality-Independent Conceptual Representations Encode Visual but Not Lexical Representations

A dual‐channel language decoding from brain activity with progressive transfer training

A neural decoding algorithm that generates language from visual activity evoked by natural images

Multimodal deep neural decoding reveals highly resolved spatiotemporal profile of visual object representation in humans

Efficient Neural Decoding Based on Multimodal Training

Retrieving and reconstructing conceptually similar images from fMRI with latent diffusion models and a neuro-inspired brain decoding model

Modelling Multimodal Integration in Human Concept Processing with Vision-and-Language Models

MindFormer: Semantic Alignment of Multi-Subject fMRI for Brain Decoding