Long short‐term memory‐based neural decoding of object categories evoked by natural images

Wei Huang,Hongmei Yan,Chong Wang,Jiyi Li,Xiaoqing Yang,Liang Li,Zhentao Zuo,Jiang Zhang,Huafu Chen
DOI: https://doi.org/10.1002/hbm.25136
IF: 4.8
2020-07-10
Human Brain Mapping
Abstract:<p>Visual perceptual decoding is one of the important and challenging topics in cognitive neuroscience. Building a mapping model between visual response signals and visual contents is the key point of decoding. Most previous studies used peak response signals to decode object categories. However, brain activities measured by functional magnetic resonance imaging are a dynamic process with time dependence, so peak signals cannot fully represent the whole process, which may affect the performance of decoding. Here, we propose a decoding model based on long short‐term memory (LSTM) network to decode five object categories from multitime response signals evoked by natural images. Experimental results show that the average decoding accuracy using the multitime (2–6 s) response signals is 0.540 from the five subjects, which is significantly higher than that using the peak ones (6 s; accuracy: 0.492; <i>p </i> &lt; .05). In addition, from the perspective of different durations, methods and visual areas, the decoding performances of the five object categories are deeply and comprehensively explored. The analysis of different durations and decoding methods reveals that the LSTM‐based decoding model with sequence simulation ability can fit the time dependence of the multitime visual response signals to achieve higher decoding performance. The comparative analysis of different visual areas demonstrates that the higher visual cortex (VC) contains more semantic category information needed for visual perceptual decoding than lower VC.</p>
radiology, nuclear medicine & medical imaging,neurosciences,neuroimaging
What problem does this paper attempt to address?
The main problem this paper attempts to address is the issue of temporal dependency in visual perceptual decoding. Specifically, the authors point out that most previous studies primarily used the peak response of fMRI signals when decoding object categories, neglecting the temporal dynamic characteristics of these signals. However, since brain activity measured by fMRI is a dynamic process with temporal dependency, using only peak signals cannot fully represent the entire process, which may affect decoding performance. To address this issue, the authors propose a decoding model based on Long Short-Term Memory (LSTM) networks to decode five categories of objects (horse, building, flower, fruit, and landscape) from multi-time response signals elicited by natural images. Experimental results show that the average accuracy of decoding using multi-time response signals from 2-6 seconds is 0.540, significantly higher than the accuracy of 0.492 using only the 6-second peak signal (p<0.05). Additionally, the authors conducted an in-depth and comprehensive discussion on the decoding performance of the five categories of objects from the perspectives of different durations, methods, and visual areas, further validating the superiority of the LSTM model in handling temporally dependent signals.