Abstract:Nowadays, misinformation is widely spreading over various social media platforms and causes extremely negative impacts on society. To combat this issue, automatically identifying misinformation, especially those containing multimodal content, has attracted growing attention from the academic and industrial communities, and induced an active research topic named Multimodal Misinformation Detection (MMD). Typically, existing MMD methods capture the semantic correlation and inconsistency between multiple modalities, but neglect some potential clues in multimodal content. Recent studies suggest that manipulated traces of the images in articles are non-trivial clues for detecting misinformation. Meanwhile, we find that the underlying intentions behind the manipulation, e.g., harmful and harmless, also matter in MMD. Accordingly, in this work, we propose to detect misinformation by learning manipulation features that indicate whether the image has been manipulated, as well as intention features regarding the harmful and harmless intentions of the manipulation. Unfortunately, the manipulation and intention labels that make these features discriminative are unknown. To overcome the problem, we propose two weakly supervised signals as alternatives by introducing additional datasets on image manipulation detection and formulating two classification tasks as positive and unlabeled learning problems. Based on these ideas, we propose a novel MMD method, namely Harmfully Manipulated Images Matter in MMD (HAMI-M3D). Extensive experiments across three benchmark datasets can demonstrate that HAMI-M3D can consistently improve the performance of any MMD baselines.

What problem does this paper attempt to address?

The paper primarily focuses on the issue of fake information detection under multimodal (text and image) information, particularly on how to utilize image manipulation traces and their underlying intentions to improve detection performance. The core contributions of the paper include: 1. **Proposing a new multimodal fake information detection method** (Hami-m3d), which can extract features of whether an image has been manipulated and the intention behind the manipulation (harmful or harmless), and integrate these features into existing multimodal features to enhance the classifier's discriminative ability. 2. **Addressing the problem of unknown manipulation and intention labels**. By introducing an additional image manipulation detection dataset for knowledge distillation and using Positive-Unlabeled (PU) learning techniques to train the manipulation detector and intention classifier. 3. **Demonstrating the effectiveness of the method through experiments**. Experimental results on three benchmark datasets show that Hami-m3d can significantly improve the average performance of baseline models by about 1.21 points. Specifically, the workflow of Hami-m3d is as follows: - **Feature Encoder Module**: Includes a text encoder, image encoder, manipulation encoder, and intention encoder to extract different types of features. - **Feature Fusion Module**: Uses a multi-head attention network to integrate the extracted features into a comprehensive feature. - **Predictor Module**: Includes an authenticity classifier, manipulation classifier, and intention classifier to predict the authenticity of the article, whether the image has been manipulated, and the intention behind the manipulation, respectively. To overcome the problem of unknown manipulation and intention labels, the authors adopted two strategies: 1. **Manipulation Classification**: First, pre-train a teacher model on an additional image manipulation detection dataset, then adapt it to the multimodal fake information detection dataset using Positive-Unlabeled learning techniques. 2. **Intention Classification**: Based on the fact that if the image of real information is manipulated, its intention must be harmless, the intention classification problem is transformed into a Positive-Unlabeled learning problem; additionally, another fact that if the image is manipulated with harmful intention, the authenticity of the article must be false is used to check the reliability of the predictions. Finally, a series of experiments validated the effectiveness and superiority of the proposed method.

Harmfully Manipulated Images Matter in Multimodal Misinformation Detection

Detecting and Grounding Multi-Modal Media Manipulation and Beyond

Exploring Saliency Bias in Manipulation Detection

Multi-Modality Image Manipulation Detection

Why Misinformation is Created? Detecting them by Integrating Intent Features

On the Effectiveness of Image Manipulation Detection in the Age of Social Media

MMIDR: Teaching Large Language Model to Interpret Multimodal Misinformation via Knowledge Distillation

MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs

Multimodal Misinformation Detection by Learning from Synthetic Data with Multimodal LLMs

M^3:Manipulation Mask Manufacturer for Arbitrary-Scale Super-Resolution Mask

Detecting Misinformation in Multimedia Content through Cross-Modal Entity Consistency: A Dual Learning Approach

Synthetic Misinformers: Generating and Combating Multimodal Misinformation

A New Benchmark and Model for Challenging Image Manipulation Detection

VERITE: a Robust benchmark for multimodal misinformation detection accounting for unimodal bias

Robust Domain Misinformation Detection via Multi-modal Feature Alignment

ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection

Exploiting Modality-Specific Features For Multi-Modal Manipulation Detection And Grounding

Towards Low-Resource Harmful Meme Detection with LMM Agents

VMID: A Multimodal Fusion LLM Framework for Detecting and Identifying Misinformation of Short Videos