Harmfully Manipulated Images Matter in Multimodal Misinformation Detection

Bing Wang,Shengsheng Wang,Changchun Li,Renchu Guan,Ximing Li
2024-07-27
Abstract:Nowadays, misinformation is widely spreading over various social media platforms and causes extremely negative impacts on society. To combat this issue, automatically identifying misinformation, especially those containing multimodal content, has attracted growing attention from the academic and industrial communities, and induced an active research topic named Multimodal Misinformation Detection (MMD). Typically, existing MMD methods capture the semantic correlation and inconsistency between multiple modalities, but neglect some potential clues in multimodal content. Recent studies suggest that manipulated traces of the images in articles are non-trivial clues for detecting misinformation. Meanwhile, we find that the underlying intentions behind the manipulation, e.g., harmful and harmless, also matter in MMD. Accordingly, in this work, we propose to detect misinformation by learning manipulation features that indicate whether the image has been manipulated, as well as intention features regarding the harmful and harmless intentions of the manipulation. Unfortunately, the manipulation and intention labels that make these features discriminative are unknown. To overcome the problem, we propose two weakly supervised signals as alternatives by introducing additional datasets on image manipulation detection and formulating two classification tasks as positive and unlabeled learning problems. Based on these ideas, we propose a novel MMD method, namely Harmfully Manipulated Images Matter in MMD (HAMI-M3D). Extensive experiments across three benchmark datasets can demonstrate that HAMI-M3D can consistently improve the performance of any MMD baselines.
Computation and Language,Computer Vision and Pattern Recognition,Multimedia
What problem does this paper attempt to address?
The paper primarily focuses on the issue of fake information detection under multimodal (text and image) information, particularly on how to utilize image manipulation traces and their underlying intentions to improve detection performance. The core contributions of the paper include: 1. **Proposing a new multimodal fake information detection method** (Hami-m3d), which can extract features of whether an image has been manipulated and the intention behind the manipulation (harmful or harmless), and integrate these features into existing multimodal features to enhance the classifier's discriminative ability. 2. **Addressing the problem of unknown manipulation and intention labels**. By introducing an additional image manipulation detection dataset for knowledge distillation and using Positive-Unlabeled (PU) learning techniques to train the manipulation detector and intention classifier. 3. **Demonstrating the effectiveness of the method through experiments**. Experimental results on three benchmark datasets show that Hami-m3d can significantly improve the average performance of baseline models by about 1.21 points. Specifically, the workflow of Hami-m3d is as follows: - **Feature Encoder Module**: Includes a text encoder, image encoder, manipulation encoder, and intention encoder to extract different types of features. - **Feature Fusion Module**: Uses a multi-head attention network to integrate the extracted features into a comprehensive feature. - **Predictor Module**: Includes an authenticity classifier, manipulation classifier, and intention classifier to predict the authenticity of the article, whether the image has been manipulated, and the intention behind the manipulation, respectively. To overcome the problem of unknown manipulation and intention labels, the authors adopted two strategies: 1. **Manipulation Classification**: First, pre-train a teacher model on an additional image manipulation detection dataset, then adapt it to the multimodal fake information detection dataset using Positive-Unlabeled learning techniques. 2. **Intention Classification**: Based on the fact that if the image of real information is manipulated, its intention must be harmless, the intention classification problem is transformed into a Positive-Unlabeled learning problem; additionally, another fact that if the image is manipulated with harmful intention, the authenticity of the article must be false is used to check the reliability of the predictions. Finally, a series of experiments validated the effectiveness and superiority of the proposed method.