Exploring the Role of Audio in Multimodal Misinformation Detection

Moyang Liu,Yukun Liu,Ruibo Fu,Zhengqi Wen,Jianhua Tao,Xuefei Liu,Guanjun Li

2024-08-23

Abstract:With the rapid development of deepfake technology, especially the deep audio fake technology, misinformation detection on the social media scene meets a great challenge. Social media data often contains multimodal information which includes audio, video, text, and images. However, existing multimodal misinformation detection methods tend to focus only on some of these modalities, failing to comprehensively address information from all modalities. To comprehensively address the various modal information that may appear on social media, this paper constructs a comprehensive multimodal misinformation detection framework. By employing corresponding neural network encoders for each modality, the framework can fuse different modality information and support the multimodal misinformation detection task. Based on the constructed framework, this paper explores the importance of the audio modality in multimodal misinformation detection tasks on social media. By adjusting the architecture of the acoustic encoder, the effectiveness of different acoustic feature encoders in the multimodal misinformation detection tasks is investigated. Furthermore, this paper discovers that audio and video information must be carefully aligned, otherwise the misalignment across different audio and video modalities can severely impair the model performance.

Multimedia

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in the social media scenario, with the rapid development of deep - fake technology (especially audio deep - fake technology), multimodal false information detection faces huge challenges. Existing multimodal false information detection methods usually only focus on the information of certain modalities and fail to comprehensively process the information of all modalities, resulting in unsatisfactory detection effects. Therefore, this paper constructs a comprehensive multimodal false information detection framework, which fuses the information of different modalities by using neural network encoders for each modality and supports multimodal false information detection tasks. In particular, this paper explores the importance of the audio modality in multimodal false information detection and studies the effectiveness of different acoustic feature encoders. In addition, it is also found that audio and video information must be carefully aligned, otherwise misalignment across different modalities will seriously damage the model performance.

Exploring the Role of Audio in Multimodal Misinformation Detection

A Unified Framework for Modality-Agnostic Deepfakes Detection

VMID: A Multimodal Fusion LLM Framework for Detecting and Identifying Misinformation of Short Videos

Transferring Audio Deepfake Detection Capability Across Languages

A Multimodal Framework for Deepfake Detection

A Robust Approach to Multimodal Deepfake Detection

Multimodal fake news detection on social media: a survey of deep learning techniques

MIS-AVoiDD: Modality Invariant and Specific Representation for Audio-Visual Deepfake Detection

Detecting Misinformation in Multimedia Content through Cross-Modal Entity Consistency: A Dual Learning Approach

Multi-modal Fake News Detection on Social Media via Multi-grained Information Fusion

Cross-Modality and Within-Modality Regularization for Audio-Visual DeepFake Detection

Integrating Audio-Visual Features for Multimodal Deepfake Detection

Evaluation of an Audio-Video Multimodal Deepfake Dataset using Unimodal and Multimodal Detectors

Multimodaltrace: Deepfake Detection using Audiovisual Representation Learning

A Deepfake Video Detection Method Based on Multi-Modal Deep Learning Method

Magnifying multimodal forgery clues for Deepfake detection

A Multi-Stream Fusion Approach with One-Class Learning for Audio-Visual Deepfake Detection

Multimodal fake news detection via progressive fusion networks

MCL: Multimodal Contrastive Learning for Deepfake Detection

AVoiD-DF: Audio-Visual Joint Learning for Detecting Deepfake

Understanding Audiovisual Deepfake Detection: Techniques, Challenges, Human Factors and Perceptual Insights