Improving Generalization for Multimodal Fake News Detection

Sahar Tahmasebi,Sherzod Hakimov,Ralph Ewerth,Eric Müller-Budack
DOI: https://doi.org/10.1145/3591106.3592230
2023-05-30
Abstract:The increasing proliferation of misinformation and its alarming impact have motivated both industry and academia to develop approaches for fake news detection. However, state-of-the-art approaches are usually trained on datasets of smaller size or with a limited set of specific topics. As a consequence, these models lack generalization capabilities and are not applicable to real-world data. In this paper, we propose three models that adopt and fine-tune state-of-the-art multimodal transformers for multimodal fake news detection. We conduct an in-depth analysis by manipulating the input data aimed to explore models performance in realistic use cases on social media. Our study across multiple models demonstrates that these systems suffer significant performance drops against manipulated data. To reduce the bias and improve model generalization, we suggest training data augmentation to conduct more meaningful experiments for fake news detection on social media. The proposed data augmentation techniques enable models to generalize better and yield improved state-of-the-art results.
Computation and Language,Information Retrieval,Machine Learning,Multimedia
What problem does this paper attempt to address?
The paper primarily addresses the issue of model generalization in multimodal fake news detection. The authors point out that current methods are often trained on smaller or thematically limited datasets, leading to a lack of sufficient generalization when these models face real-world data. To solve this problem, the paper makes the following contributions: 1. **Proposing Three Multimodal Models**: Based on the latest Transformer architecture, three models are designed to handle the task of multimodal fake news detection on social media. These models are: - BERT-ResNet Model: Combines BERT for text encoding and ResNet for image feature extraction. - MLP-CLIP Model: Utilizes the CLIP model to extract multimodal features from images and text. - CLIP-MMBT Model: Employs a combination of CLIP and MMBT (a multimodal fusion method) for classification. 2. **In-depth Analysis of Model Performance**: By altering the input data, the paper explores the performance of these models in real-world application scenarios and finds that the performance of existing systems significantly drops when the data is manipulated. 3. **Data Augmentation Techniques**: To reduce bias and improve model generalization, the paper suggests using data augmentation techniques. Specifically, by introducing a large-scale Visual News dataset, which contains news from a wide range of topics, the model's performance in handling the task of fake news detection on social media is improved. 4. **Ensemble Method**: Furthermore, the paper introduces an ensemble method, which involves combining models trained with different manipulation techniques to make predictions, thereby further enhancing the model's generalization ability. Experimental results show that the improved data augmentation techniques and ensemble methods effectively enhance the model's adaptability and accuracy on unseen data. Particularly, the MLP-CLIP model combined with data augmentation techniques achieved the best overall results on multiple test sets.