Abstract:It is difficult for humans to distinguish the true and false of rumors, but current deep learning models can surpass humans and achieve excellent accuracy on many rumor datasets. In this paper, we investigate whether deep learning models that seem to perform well actually learn to detect rumors. We evaluate models on their generalization ability to out-of-domain examples by fine-tuning BERT-based models on five real-world datasets and evaluating against all test sets. The experimental results indicate that the generalization ability of the models on other unseen datasets are unsatisfactory, even common-sense rumors cannot be detected. Moreover, we found through experiments that models take shortcuts and learn absurd knowledge when the rumor datasets have serious data pitfalls. This means that simple modifications to the rumor text based on specific rules will lead to inconsistent model predictions. To more realistically evaluate rumor detection models, we proposed a new evaluation method called paired test (PairT), which requires models to correctly predict a pair of test samples at the same time. Furthermore, we make recommendations on how to better create rumor dataset and evaluate rumor detection model at the end of this paper.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: Have the currently well - performing deep - learning models really learned to detect rumors? Specifically, the author explores this topic through the following four sub - questions: 1. **Can the performance on individual rumor datasets be generalized to new datasets?** The author fine - tunes the BERT model on five real - world datasets and evaluates it on all test sets. It is found that the model has poor generalization ability on unseen datasets and cannot even detect common - sense rumors. 2. **Can the model detect common - sense rumors?** The author creates a dataset containing common - sense rumors and finds that the model's performance on such rumors is close to random guessing, indicating that the model has not really learned to detect these simple rumors. 3. **Are the model's prediction results trustworthy and consistent?** Through the analysis of specific cases, the author finds that there is inconsistency in the model's prediction results. For example, the model may consider that "The neighbor's pet dog gave birth to a cat" is true, and at the same time think that "Dogs can only give birth to dogs, and cats can only give birth to cats" is also true, which is obviously unreasonable. 4. **What has the model learned from the rumor datasets?** The author analyzes the words that the model focuses on through the word - level attention mechanism and finds that the model may rely on certain specific cues in the dataset (such as "Obama", "Paul", "Sydney", etc.), rather than truly understanding the text content. This dependence leads to a significant decline in the model's performance on adversarial datasets. Overall, this paper aims to reveal the limitations of current deep - learning models in the rumor - detection task and proposes suggestions for improving datasets and evaluation methods to improve the reliability and generalization ability of the model.

True or False: Does the Deep Learning Model Learn to Detect Rumors?

Interpretable Graph Neural Network for Social Media Rumor Detection

Detecting the Rumor Patterns Integrating Features of User, Content, and the Spreading Structure.

Research Status of Deep Learning Methods for Rumor Detection

Call Attention to Rumors: Deep Attention Based Recurrent Neural Networks for Early Rumor Detection

A Novel and High-Accuracy Rumor Detection Approach Using Kernel Subtree and Deep Learning Networks

Examining the Limitations of Computational Rumor Detection Models Trained on Static Datasets

Detecting False Rumors from Retweet Dynamics on Social Media

HAT4RD: Hierarchical Adversarial Training for Rumor Detection in Social Media

Detect Rumors in Microblog Posts for Low-Resource Domains via Adversarial Contrastive Learning

Neural network approaches for rumor stance detection: Simulating complex rumor propagation systems

An End-to-End Rumor Detection Model Based on Feature Aggregation

A Novel Fine-Grained Rumor Detection Algorithm with Attention Mechanism

Can Large Language Models Detect Rumors on Social Media?

On Early-stage Debunking Rumors on Twitter: Leveraging the Wisdom of Weak Learners

A Comprehensive Low and High-level Feature Analysis for Early Rumor Detection on Twitter

Interpretable Rumor Detection in Microblogs by Attending to User Interactions

Ensemble Deep Learning on Time-Series Representation of Tweets for Rumor Detection in Social Media

A C-GRU Neural Network for Rumors Detection

Modeling microscopic and macroscopic information diffusion for rumor detection

TSNN: A Topic and Structure Aware Neural Network for Rumor Detection