Inter-Intra Modal Representation Augmentation with Trimodal Collaborative Disentanglement Network for Multimodal Sentiment Analysis

Chen,Hansheng Hong,Jie Guo,Bin Song
DOI: https://doi.org/10.1109/taslp.2023.3263801
2023-01-01
Abstract:Recently, Multimodal Sentiment Analysis (MSA) is a challenging research area given its complex nature, and humans express emotional cues across various modalities such as language, facial expressions, and speech. Representation and fusion of features are the most crucial tasks in multimodal sentiment analysis research. However, in the current research, most methods ignore the importance of eliminating potential irrelevant features in the original features of each modality and cross-modal common feature. Moreover, the features extracted from all the modalities contain cluttered background noise and different occlusions noise, which negatively affects feature alignment. Different from these methods, we propose a novel Trimodal Collaborative Disentanglement Network (TCDN) to solve these problems in this paper. This work can obtain effective sentiment results on two aspects: i) Trimodal collaborative uses L1-norm to eliminate irrelevant features and unify the characteristics of the three modals (inter-modal). ii) Disentanglement network introduces an adversary noise by combining the original features of various single modalities and the common representation, alleviating the background noises within each modality (intra-modal). This inter-intra modal feature augmentation method is the first work to obtain the common representation by implementing data augmentation as far as we know. Extensive experiments are completed on two benchmark datasets, including MOSI and MOSEI, demonstrating the superiority of the TCDN model over the state-of-the-art methods.
What problem does this paper attempt to address?