Learning Speaker-Independent Multimodal Representation for Sentiment Analysis
Jianwen Wang,Shiping Wang,Mingwei Lin,Zeshui Xu,Wenzhong Guo
DOI: https://doi.org/10.1016/j.ins.2023.01.116
IF: 8.1
2023-02-01
Information Sciences
Abstract:Multimodal sentiment analysis is an actively growing research area that utilizes language, acoustic and visual signals to predict sentiment inclination. Compared to language, acoustic and visual features carry a more evident personal style which may degrade the model generalization capability. The issue will be exacerbated in a speaker-independent setting, where the model will encounter samples from unseen speakers during the testing stage. To mitigate personal style's impact, we propose a framework named SIMR for learning speaker-independent multimodal representation. This framework separates the non-verbal inputs into style encoding and content representation with the aid of informative cross-modal correlations. Besides, in terms of integrating cross-modal complementary information, the classical transformer-based approaches are inherently inclined to discover compatible cross-modal interactions but ignore incompatible ones. In contrast, we suggest simultaneously locating both through an enhanced cross-modal transformer module. Experimental results show that the proposed model achieves state-of-the-art performance on several datasets.
computer science, information systems
What problem does this paper attempt to address?