Fighting the COVID-19 Infodemic with Supervised Machine Learning, Computational Linguistics, and Network Science (Preprint)

Mohammad AR Abdeen,Ahmed Abdeen Hamed,Xindong Wu
DOI: https://doi.org/10.2196/preprints.26785
2020-01-01
Abstract:BACKGROUND The spread of the Coronavirus pandemic has been accompanied by an infodemic. The false information that is embedded in the infodemic affects people’s ability to have access to safety and follow proper procedures to mitigate the risks. OBJECTIVE This research aims to target the falsehood part of the infodemic, which prominently proliferates in news articles. Specifically, we present a computational approach that predicts if a news article falls under the category of a COVID-19 safe or suspicious. METHODS Here, we present a novel supervised machine learning and a computational linguistic approach that analyzes the content of a given news article and assign a label to it. In particular, we designed an algorithm which we called NeoNet that is trained by a network of noun-phrases selected from a trustworthy COVID-19 news dataset. Noun-phrases are known to capture facts and eliminate subjectivity. When trained, the algorithm predicts a label for new articles and decides whether an article is suspicious. RESULTS The result shows that the NeoNet algorithm predicts a label of an article with a 98.8% precision using a non-pruned model and 95.8% precision using a pruned model. In five different comparisons, NeoNet surpassed NaiveBayes three times while the other two were too close to call in a pruned setting. When compared without pruning, NeoNet outperformed NaiveBayes in all the five experiments. CONCLUSIONS The infodemic that has accompanied the COVID-19 pandemic presents a significant challenge because of the spread of misinformation, disinformation, fake news, rumors, and conspiracy theories. However, using machine learning combined with the powerful computational linguistic methods can provide the necessary tools to inform the general public of whether a news article is COVID-19 SAFE or DISPUTED (when containing suspicious contents). CLINICALTRIAL N/A
What problem does this paper attempt to address?