Learning Multi-Task Commonness and Uniqueness for Multi-Modal Sarcasm Detection and Sentiment Analysis in Conversation

Yazhou Zhang,Yang Yu,Dongming Zhao,Zuhe Li,Bo Wang,Yuexian Hou,Prayag Tiwari,Jing Qin
DOI: https://doi.org/10.1109/tai.2023.3298328
2023-01-01
IEEE Transactions on Artificial Intelligence
Abstract:Sarcasm is a form of figurative language device to express human inner feelings, where the author writes the positive sentence on surface form, while he/she actually expresses negative sentiment, vice versa. Sentiment thus comes into sight, and is closely related with sarcasm, leading to the recent popularity of multi-modal sarcasm and sentiment joint detection in conversation (dialogue). The key challenges involve multi-modal fusion and multi-task interaction. Most of the existing studies have focused on building multi-modal fused representation, while the commonness and uniqueness across related tasks has not received attention. To fill this gap, we propose a multi-modal multi-task interaction learning framework, termed MIL, for joint detection of sarcasm and sentiment. Specifically, a cross-modal target attention mechanism is proposed to automatically learn the alignment between texts and images/speeches. In addition, a multi-modal interaction learning paradigm consisting of a dual-gating network, three separate fully-connected layers that simultaneously capture the commonness and uniqueness. Comprehensive experiments on two benchmarking datasets (i.e., Memotion and MUStARD) show the effectiveness of the proposed model over state-of-the-art baselines with a significant improvement of 1.9%, 2.4% in terms of F1.
What problem does this paper attempt to address?