VSCA: A Sentence Matching Model Incorporating Visual Perception

Zhe Zhang,Guangli Xiao,Yurong Qian,Mengnan Ma,Hongyong Leng,Tao Zhang
DOI: https://doi.org/10.1007/s12559-022-10074-8
IF: 4.89
2022-12-09
Cognitive Computation
Abstract:Stacking multiple layers of attention networks can significantly improve a model's performance. However, this also increases the model's time and space complexity, making it difficult for the model to capture detailed information on the underlying features. We propose a novel sentence matching model (VSCA) that uses a new attention mechanism based on variational autoencoders (VAE), which exploits the contextual information in sentences to construct a basic attention feature map and combines it with VAE to generate multiple sets of related attention feature maps for fusion. Furthermore, VSCA introduces a spatial attention mechanism that combines visual perception to capture multilevel semantic information. The experimental results show that our proposed model outperforms pretrained models such as BERT on the LCQMC dataset and performs well on the PAWS-X data. Our work consists of two parts. The first part compares the proposed sentence matching model with state-of-the-art pretrained models such as BERT. The second part conducts innovative research on applying VAE and spatial attention mechanisms in NLP. The experimental results on the related datasets show that the proposed method has satisfactory performance, and VSCA can capture rich attentional information and detailed information with less time and space complexity. This work provides insights into the application of VAE and spatial attention mechanisms in NLP.
computer science, artificial intelligence,neurosciences
What problem does this paper attempt to address?