Multimodal Temporal Attention in Sentiment Analysis

Yu He,Licai Sun,Zheng Lian,Bin Liu,Jianhua Tao,Meng Wang,Yuan Cheng
DOI: https://doi.org/10.1145/3551876.3554811
2022-01-01
Abstract:ABSTRACTIn this paper, we present the solution to the MuSe-Stress sub-challenge in the MuSe 2022 Multimodal Sentiment Analysis Challenge. The task of MuSe-Stress is to predict a time-continuous value (i.e., physiological arousal and valence) based on multimodal data of audio, visual, text, and physiological signals. In this competition, we find that multimodal fusion has good performance for physiological arousal on the validation set, but poor prediction performance on the test set. We believe that problem may be due to the over-fitting caused by the model's over-reliance on some specific modal features. To deal with the above problem, we propose Multimodal Temporal Attention (MMTA), which considers the temporal effects of all modalities on each unimodal branch, realizing the interaction between unimodal branches and adaptive inter-modal balance. The concordance correlation coefficient (CCC) of physiological arousal and valence are 0.6818 with MMTA and 0.6841 with early fusion, respectively, both ranking Top 1, outperforming the baseline system by a large margin (i.e., 0.4761 and 0.4931) on the test set.
What problem does this paper attempt to address?