Cross-modal sentiment analysis based on Transformer and image-text collaborative interaction

Liang Wang,Yuanyuan Zhang,Haicheng Wang
DOI: https://doi.org/10.1109/cedl60560.2023.00024
2023-06-29
Abstract:In order to make full use of the interaction between images and texts, improve the extraction effect of each modal feature, and improve the accuracy of multi-modal classification, this paper proposed a sentiment classification method based on Transformer and image-text collaborative interaction. In this model, the image and text features are extracted based on Transformer, and the bidirectional long short-term memory network and attention mechanism are introduced while the text features are extracted, and the Vision Transformer is used to obtain the visual features of the image. Then, a fusion method based on information adaptive weight adjustment is designed, which adjusts the weight parameters of model fusion in real time according to the feature information of image and text to perform weighted fusion. Experiments show that the model improves the accuracy and F1 value.
Computer Science
What problem does this paper attempt to address?