A CNN-transformer hybrid approach for decoding visual neural activity into text
Jiang Zhang,Chen Li,Ganwanming Liu,Min Min,Chong Wang,Jiyi Li,Yuting Wang,Hongmei Yan,Zhentao Zuo,Wei Huang,Huafu Chen
DOI: https://doi.org/10.1016/j.cmpb.2021.106586
IF: 6.1
2022-02-01
Computer Methods and Programs in Biomedicine
Abstract:BACKGROUND AND OBJECTIVE: Most studies used neural activities evoked by linguistic stimuli such as phrases or sentences to decode the language structure. However, compared to linguistic stimuli, it is more common for the human brain to perceive the outside world through non-linguistic stimuli such as natural images, so only relying on linguistic stimuli cannot fully understand the information perceived by the human brain. To address this, an end-to-end mapping model between visual neural activities evoked by non-linguistic stimuli and visual contents is demanded.METHODS: Inspired by the success of the Transformer network in neural machine translation and the convolutional neural network (CNN) in computer vision, here a CNN-Transformer hybrid language decoding model is constructed in an end-to-end fashion to decode functional magnetic resonance imaging (fMRI) signals evoked by natural images into descriptive texts about the visual stimuli. Specifically, this model first encodes a semantic sequence extracted by a two-layer 1D CNN from the multi-time visual neural activity into a multi-level abstract representation, then decodes this representation, step by step, into an English sentence.RESULTS: Experimental results show that the decoded texts are semantically consistent with the corresponding ground truth annotations. Additionally, by varying the encoding and decoding layers and modifying the original positional encoding of the Transformer, we found that a specific architecture of the Transformer is required in this work.CONCLUSIONS: The study results indicate that the proposed model can decode the visual neural activities evoked by natural images into descriptive text about the visual stimuli in the form of sentences. Hence, it may be considered as a potential computer-aided tool for neuroscientists to understand the neural mechanism of visual information processing in the human brain in the future.
engineering, biomedical,computer science, interdisciplinary applications,medical informatics, theory & methods