Debiasing Multimodal Sarcasm Detection with Contrastive Learning

Mengzhao Jia,Can Xie,Liqiang Jing
2023-12-19
Abstract:Despite commendable achievements made by existing work, prevailing multimodal sarcasm detection studies rely more on textual content over visual information. It unavoidably induces spurious correlations between textual words and labels, thereby significantly hindering the models' generalization capability. To address this problem, we define the task of out-of-distribution (OOD) multimodal sarcasm detection, which aims to evaluate models' generalizability when the word distribution is different in training and testing settings. Moreover, we propose a novel debiasing multimodal sarcasm detection framework with contrastive learning, which aims to mitigate the harmful effect of biased textual factors for robust OOD generalization. In particular, we first design counterfactual data augmentation to construct the positive samples with dissimilar word biases and negative samples with similar word biases. Subsequently, we devise an adapted debiasing contrastive learning mechanism to empower the model to learn robust task-relevant features and alleviate the adverse effect of biased words. Extensive experiments show the superiority of the proposed framework.
Computation and Language,Multimedia
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve the bias problem in Multimodal Sarcasm Detection (MSD). Specifically, the existing multimodal sarcasm detection research relies too much on textual information and ignores visual information, resulting in poor performance of the model when the training and testing data distributions are inconsistent (i.e., Out - of - Distribution (OOD) scenarios). This over - reliance on textual information can cause the model to be affected by unreliable cues in the text (such as bias words), thus leading to incorrect predictions. ### Background and motivation With the rise of social media, people are increasingly using sarcastic expressions to voice their opinions on platforms. Therefore, accurately detecting sarcastic expressions has become particularly important for sentiment analysis and opinion mining. Early research mainly focused on pure - text methods, but with the development of multimedia devices, people began to express emotions and opinions through multimodal content (text and image). Image content often carries key cues for conveying sarcasm, which makes multimodal sarcasm detection a research hotspot. However, the existing multimodal sarcasm detection models still have the following problems: 1. **Over - reliance on textual information**: Existing models tend to rely more on textual information rather than visual information, which makes the models vulnerable to the influence of bias words in the text, thus affecting their generalization ability. 2. **Poor performance in OOD scenarios**: When the training and testing data distributions are inconsistent, the performance of the models will decline significantly because they rely on spurious correlations in the training data. ### Solutions To solve the above problems, the author proposes a new task - OOD multimodal sarcasm detection, and designs a new de - biasing multimodal sarcasm detection framework (DMSD - CL) that combines contrastive learning techniques. The specific methods are as follows: 1. **Counterfactual data augmentation**: Construct positive and negative samples by generating samples with similar bias words but opposite labels, and samples with different bias words but the same label. 2. **Adaptive de - biasing contrastive learning**: By re - weighting the contrastive learning loss function, the model can better distinguish samples with similar bias words but different labels, and narrow the gap between samples with different bias words but the same label. ### Experimental results The author conducted experiments on publicly available multimodal sarcasm detection benchmark datasets. The results show that the proposed DMSD - CL framework performs well on both the standard test set (IID) and the OOD test set, especially in OOD scenarios, its performance is significantly better than existing methods. ### Main contributions 1. **Defined a new OOD multimodal sarcasm detection task** to evaluate the true generalization ability of the model in OOD scenarios. 2. **Proposed a de - biasing multimodal sarcasm detection framework based on contrastive learning**, which improves the generalization ability of the model through counterfactual data augmentation and adaptive de - biasing contrastive learning. 3. **Constructed an OOD test set** and verified the effectiveness of the proposed method on this test set. ### Conclusion This paper effectively solves the problem of poor performance of existing models in OOD scenarios by introducing a new OOD multimodal sarcasm detection task and a de - biasing contrastive learning framework, providing new ideas and methods for research in the field of multimodal sarcasm detection.