Towards an Intrinsic Interpretability Approach for Multimodal Hate Speech Detection

Pengfei Du,Yali Gao,Xiaoyong Li
DOI: https://doi.org/10.1142/s0218001422500409
IF: 1.261
2022-01-01
International Journal of Pattern Recognition and Artificial Intelligence
Abstract:With the development of social media, multimodal hate speech that relies on images and text has become an emerging way of spreading hate. The detection of multimodal hate speech is gradually becoming an increasingly challenging task. While many works based on neural networks and multimodal machine learning were proposed to detect multimodal hate speech, only few attempts have been made in terms of the interpretability of the task. This leads to difficulties in analyzing prediction results and model improvement. Therefore, this paper investigates the interpretable multimodal hate speech detection task and develops an intrinsically interpretable deep learning method by leveraging the multimodal architecture. Specifically, we leverage a multimodal pretrained model as the backbone of the final detection results and parallel an interpretability module via a joint training approach, which calculates the input tokens and fine-grained tags through a filter-gate attention mechanism. The interpretability module provides an interpretable basis for the final result judgment. We conduct experiments on the hate speech detection dataset and demonstrate that our proposed method not only significantly outperforms other methods but also provides interpretable insights into the decisions of our model.
What problem does this paper attempt to address?