Abstract:BACKGROUND AND OBJECTIVE: Most studies used neural activities evoked by linguistic stimuli such as phrases or sentences to decode the language structure. However, compared to linguistic stimuli, it is more common for the human brain to perceive the outside world through non-linguistic stimuli such as natural images, so only relying on linguistic stimuli cannot fully understand the information perceived by the human brain. To address this, an end-to-end mapping model between visual neural activities evoked by non-linguistic stimuli and visual contents is demanded.METHODS: Inspired by the success of the Transformer network in neural machine translation and the convolutional neural network (CNN) in computer vision, here a CNN-Transformer hybrid language decoding model is constructed in an end-to-end fashion to decode functional magnetic resonance imaging (fMRI) signals evoked by natural images into descriptive texts about the visual stimuli. Specifically, this model first encodes a semantic sequence extracted by a two-layer 1D CNN from the multi-time visual neural activity into a multi-level abstract representation, then decodes this representation, step by step, into an English sentence.RESULTS: Experimental results show that the decoded texts are semantically consistent with the corresponding ground truth annotations. Additionally, by varying the encoding and decoding layers and modifying the original positional encoding of the Transformer, we found that a specific architecture of the Transformer is required in this work.CONCLUSIONS: The study results indicate that the proposed model can decode the visual neural activities evoked by natural images into descriptive text about the visual stimuli in the form of sentences. Hence, it may be considered as a potential computer-aided tool for neuroscientists to understand the neural mechanism of visual information processing in the human brain in the future.

A neural decoding algorithm that generates language from visual activity evoked by natural images

Emotion Recognition with Feature Extracted from the Manifold of Brain Networks

Disrupted Motion-Related Functional Connectivity Changes in Patients with Generalized Epilepsy

MindGPT: Interpreting What You See with Non-invasive Brain Recordings

Neuro-Vision to Language: Enhancing Brain Recording-based Visual Reconstruction and Language Interaction

A dual‐channel language decoding from brain activity with progressive transfer training

Decoding Linguistic Representations of Human Brain

Decoding Visual Experience and Mapping Semantics through Whole-Brain Analysis Using fMRI Foundation Models

Decoding human brain activity with deep learning

Brain decoding: toward real-time reconstruction of visual perception

From Sight to Insight: A Multi-task Approach with the Visual Language Decoding Model

A CNN-transformer hybrid approach for decoding visual neural activity into text

Decoding Visual Neural Representations by Multimodal Learning of Brain-Visual-Linguistic Features

Language Generation from Brain Recordings

Decoding Imagined and Spoken Phrases From Non-invasive Neural (MEG) Signals

Artificial Intelligence Based Multimodal Language Decoding from Brain Activity: A Review

Towards Linguistic Neural Representation Learning and Sentence Retrieval from Electroencephalogram Recordings

Unidirectional brain-computer interface: Artificial neural network encoding natural images to fMRI response in the visual cortex

Decoding Continuous Character-based Language from Non-invasive Brain Recordings

NeuSpeech: Decode Neural signal as Speech