From Sight to Insight: A Multi-task Approach with the Visual Language Decoding Model
Wei Huang,Pengfei Yang,Ying Tang,Fan Qin,Hengjiang Li,Diwei Wu,Wei Ren,Sizhuo Wang,Yuhao Zhao,Jing Wang,Haoxiang Liu,Jingpeng Li,Yucheng Zhu,Bo Zhou,Jingyuan Sun,Qiang Li,Kaiwen Cheng,Hongmei Yan,Huafu Chen
DOI: https://doi.org/10.1101/2024.02.16.580578
2024-02-21
Abstract:Visual neural decoding aims to unlock the mysteries of how the human brain interprets the visual world. While early studies made some progress in decoding visual activity for singular type of information, they failed to concurrently reveal the multi-level interweaving linguistic information in the brain. Here, we developed a novel Visual Language Decoding Model (VLDM) capable of decoding categories, semantic labels, and textual descriptions from visual perceptual activities simultaneously. We selected the large-scale NSD dataset to ensure the efficiency of the decoding model in joint training and evaluation across multiple tasks. For category decoding, we achieved the effective classification of 12 categories with an accuracy of nearly 70%, significantly surpassing the chance level. For label decoding, we attained the precise prediction of 80 specific semantic labels with a 16-fold improvement over the chance level. For text decoding, the scores of the decoded text surpassed the corresponding baseline levels by remarkable margins on six evaluation metrics. This study contributes significantly to extensive applications in multi-layered brain-computer interfaces, potentially leading to more natural and efficient human-computer interaction experiences.
Neuroscience