Abstract:The increasing proliferation of misinformation and its alarming impact have motivated both industry and academia to develop approaches for fake news detection. However, state-of-the-art approaches are usually trained on datasets of smaller size or with a limited set of specific topics. As a consequence, these models lack generalization capabilities and are not applicable to real-world data. In this paper, we propose three models that adopt and fine-tune state-of-the-art multimodal transformers for multimodal fake news detection. We conduct an in-depth analysis by manipulating the input data aimed to explore models performance in realistic use cases on social media. Our study across multiple models demonstrates that these systems suffer significant performance drops against manipulated data. To reduce the bias and improve model generalization, we suggest training data augmentation to conduct more meaningful experiments for fake news detection on social media. The proposed data augmentation techniques enable models to generalize better and yield improved state-of-the-art results.

What problem does this paper attempt to address?

The paper primarily addresses the issue of model generalization in multimodal fake news detection. The authors point out that current methods are often trained on smaller or thematically limited datasets, leading to a lack of sufficient generalization when these models face real-world data. To solve this problem, the paper makes the following contributions: 1. **Proposing Three Multimodal Models**: Based on the latest Transformer architecture, three models are designed to handle the task of multimodal fake news detection on social media. These models are: - BERT-ResNet Model: Combines BERT for text encoding and ResNet for image feature extraction. - MLP-CLIP Model: Utilizes the CLIP model to extract multimodal features from images and text. - CLIP-MMBT Model: Employs a combination of CLIP and MMBT (a multimodal fusion method) for classification. 2. **In-depth Analysis of Model Performance**: By altering the input data, the paper explores the performance of these models in real-world application scenarios and finds that the performance of existing systems significantly drops when the data is manipulated. 3. **Data Augmentation Techniques**: To reduce bias and improve model generalization, the paper suggests using data augmentation techniques. Specifically, by introducing a large-scale Visual News dataset, which contains news from a wide range of topics, the model's performance in handling the task of fake news detection on social media is improved. 4. **Ensemble Method**: Furthermore, the paper introduces an ensemble method, which involves combining models trained with different manipulation techniques to make predictions, thereby further enhancing the model's generalization ability. Experimental results show that the improved data augmentation techniques and ensemble methods effectively enhance the model's adaptability and accuracy on unseen data. Particularly, the MLP-CLIP model combined with data augmentation techniques achieved the best overall results on multiple test sets.

Improving Generalization for Multimodal Fake News Detection

Cross-Modal Augmentation for Few-Shot Multimodal Fake News Detection

Multi-modal Fake News Detection on Social Media via Multi-grained Information Fusion

Semantic‐enhanced multimodal fusion network for fake news detection

Deep Learning Multimodal Methods to Detect Fake News

A Self-Learning Multimodal Approach for Fake News Detection

Ensemble Techniques for Robust Fake News Detection: Integrating Transformers, Natural Language Processing, and Machine Learning

TLFND: A Multimodal Fusion Model Based on Three-Level Feature Matching Distance for Fake News Detection

Multimodal fake news detection on social media: a survey of deep learning techniques

Combating multimodal fake news on social media: methods, datasets, and future perspective

Fake news detection based on a hybrid BERT and LightGBM models

Multimodal fake news detection via progressive fusion networks

Knowledge-aware multimodal pre-training for fake news detection

Evaluating Generalizability of Fine-Tuned Models for Fake News Detection

Knowledge augmented transformer for adversarial multidomain multiclassification multimodal fake news detection

Text-image multimodal fusion model for enhanced fake news detection

Towards better representation learning using hybrid deep learning model for fake news detection

Multimodal Fake News Detection Incorporating External Knowledge and User Interaction Feature

ETMA: Efficient Transformer Based Multilevel Attention framework for Multimodal Fake News Detection

Fake News Detection Based on Text-Modal Dominance and Fusing Multiple Multi-Model Clues

An emotion-driven, transformer-based network for multimodal fake news detection