Multimodal Affective Analysis Using Hierarchical Attention Strategy with Word-Level Alignment

Yue Gu,Kangning Yang,Shiyu Fu,Shuhong Chen,Xinyu Li,Ivan Marsic
Abstract:Multimodal affective computing, learning to recognize and interpret human affect and subjective information from multiple data sources, is still challenging because:(i) it is hard to extract informative features to represent human affects from heterogeneous inputs; (ii) current fusion strategies only fuse different modalities at abstract levels, ignoring time-dependent interactions between modalities. Addressing such issues, we introduce a hierarchical multimodal architecture with attention and word-level fusion to classify utterance-level sentiment and emotion from text and audio data. Our introduced model outperforms state-of-the-art approaches on published datasets, and we demonstrate that our model's synchronized attention over modalities offers visual interpretability.
What problem does this paper attempt to address?