Yesterday's News: Benchmarking Multi-Dimensional Out-of-Distribution Generalisation of Misinformation Detection Models

Ivo Verhoeven,Pushkar Mishra,Ekaterina Shutova
2024-10-12
Abstract:This paper introduces misinfo-general, a benchmark dataset for evaluating misinformation models' ability to perform out-of-distribution generalisation. Misinformation changes rapidly, much quicker than moderators can annotate at scale, resulting in a shift between the training and inference data distributions. As a result, misinformation models need to be able to perform out-of-distribution generalisation, an understudied problem in existing datasets. We identify 6 axes of generalisation-time, event, topic, publisher, political bias, misinformation type-and design evaluation procedures for each. We also analyse some baseline models, highlighting how these fail important desiderata.
Information Retrieval,Computation and Language
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the poor performance of current misinformation detection models when dealing with out - of - distribution (OoD) data. Specifically, the rate of change of misinformation is much faster than the rate of manual annotation, resulting in a large distribution difference between training data and inference data. Therefore, existing misinformation detection models have difficulty coping with this distribution change, especially in the following aspects: 1. **Time dimension**: As time and news events progress, the form and content of misinformation keep changing, and the model needs to be able to adapt to new time points and events. 2. **Event dimension**: Misinformation in different events has different characteristics, and the model needs to be able to generalize to unseen events. 3. **Topic dimension**: Misinformation under different topics may have different language styles and expressions, and the model needs to be able to recognize content of different topics. 4. **Publisher dimension**: Different publishers have different writing styles and tendencies, and the model needs to be able to adapt to all types of publishers. 5. **Political bias dimension**: Publishers with different political tendencies may have different narrative methods, and the model needs to be able to distinguish content with different political biases. 6. **Misinformation type dimension**: Different types of misinformation (such as conspiracy theories, pseudoscience, etc.) have different characteristics, and the model needs to be able to recognize these different types. To evaluate and improve the out - of - distribution generalization ability of the model in these six dimensions, the author introduced a new benchmark dataset named misinfo - general and designed a corresponding evaluation method. Through this dataset, researchers can better understand and improve the performance of misinformation detection models when facing out - of - distribution data, thereby enhancing the practical application effect of the model. ### Formula examples Although this article does not involve complex mathematical formulas, some indicators are used when describing the performance of the model, such as Matthews correlation coefficient (MCC) and F1 - score. The following are the definitions of these indicators: - **Matthews correlation coefficient (MCC)**: \[ MCC=\frac{TP\times TN - FP\times FN}{\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}} \] where \(TP\) is the number of true positives, \(TN\) is the number of true negatives, \(FP\) is the number of false positives, and \(FN\) is the number of false negatives. - **F1 - score**: \[ F1 = 2\times\frac{Precision\times Recall}{Precision + Recall} \] where \(Precision=\frac{TP}{TP + FP}\), \(Recall=\frac{TP}{TP + FN}\). These indicators are used to measure the performance of the model on different generalization axes, helping researchers understand the advantages and disadvantages of the model.