Image–Text Sentiment Analysis Via Context Guided Adaptive Fine-Tuning Transformer
Xingwang Xiao,Yuanyuan Pu,Zhengpeng Zhao,Rencan Nie,Dan Xu,Wenhua Qian,Hao Wu
DOI: https://doi.org/10.1007/s11063-022-11124-w
IF: 2.565
2022-12-27
Neural Processing Letters
Abstract:Compared with single-modal content, multimodal content conveys user's sentiments and feelings more vividly. Thus, multimodal sentiment analysis has become a research hotspot. Due to the flawed data-hungry of deep learning-based methods, transfer learning is extensively utilized. However, most transfer learning-based approaches transfer the model pre-trained on source domain to target domain by simply considering it as feature extractor (i.e., parameters are frozen) or applying global fine-tuning strategy (i.e., parameters are trainable) on it. This results in the loss of advantages of both source and target domains. In this paper, we propose a novel Context Guided Adaptive Fine-tuning Transformer (CGAFT) that investigates the strengths of both source and target domains adaptively to achieve image–text sentiment analysis. In CGAFT, a Context Guided Policy Network is first introduced to make optimal weights for each image–text instance. These weights indicate how much image sentiment information is necessary to be absorbed from each layer of the image model pre-trained on source domain and the parallel model fine-tuned on target domain. Then, image–text instance and its weights are fed into Sentiment Analysis Network to extract contextual image sentiment representations that are absorbed from both source and target domains to enhance the performance of image–text sentiment analysis. Besides, we observe that no publicly available image–text dataset is in Chinese. To fill this gap, we build an image–Chinese text dataset Flickr-ICT that contains 13,874 image–Chinese text pairs. The experiments conducted on three image–text datasets demonstrate that CGAFT outperforms strong baselines.
computer science, artificial intelligence