FakeClaim: A Multiple Platform-driven Dataset for Identification of Fake News on 2023 Israel-Hamas War

Gautam Kishore Shahi,Amit Kumar Jaiswal,Thomas Mandl
2024-01-30
Abstract:We contribute the first publicly available dataset of factual claims from different platforms and fake YouTube videos on the 2023 Israel-Hamas war for automatic fake YouTube video classification. The FakeClaim data is collected from 60 fact-checking organizations in 30 languages and enriched with metadata from the fact-checking organizations curated by trained journalists specialized in fact-checking. Further, we classify fake videos within the subset of YouTube videos using textual information and user comments. We used a pre-trained model to classify each video with different feature combinations. Our best-performing fine-tuned language model, Universal Sentence Encoder (USE), achieves a Macro F1 of 87\%, which shows that the trained model can be helpful for debunking fake videos using the comments from the user discussion. The dataset is available on Github\footnote{
Information Retrieval,Social and Information Networks
What problem does this paper attempt to address?
The paper aims to address the issue of fake news related to the 2023 Israel-Hamas war, particularly focusing on the false information circulating on social media platforms. The authors have constructed a dataset named FakeClaim, which includes factual statements about the conflict collected from various fact-checking organizations, as well as YouTube videos related to these statements. Specifically, the goals of the paper can be summarized as follows: 1. **Constructing the FakeClaim dataset**: The authors collected 1,499 statements from 60 fact-checking organizations, covering 30 languages, and extracted from 1,370 published fact-checking articles. Additionally, data related to these statements were collected from multiple social media platforms such as Facebook, Twitter, YouTube, etc. 2. **YouTube video classification**: The paper particularly focuses on the problem of identifying fake news in YouTube videos. The researchers used textual information, user comments, and contextual evidence to classify the videos, aiming to distinguish between true and false content. 3. **Model evaluation**: The authors used various pre-trained models (e.g., Universal Sentence Encoder, RoBERTa, etc.) to evaluate the performance of different feature combinations in the video classification task. The experimental results show that combining multiple features (such as video titles, comments, and fact-checking statements) can significantly improve the model's performance. Through these efforts, the paper hopes to provide an effective method for automatically identifying fake news on social media and help reduce the spread of misinformation.