Detection of Online Fake News Using N-Gram Analysis and Machine Learning Techniques

Hadeer Ahmed,Issa Traore,Sherif Saad
DOI: https://doi.org/10.1007/978-3-319-69155-8_9
2017-01-01
Abstract:Fake news is a phenomenon which is having a significant impact on our social life, in particular in the political world. Fake news detection is an emerging research area which is gaining interest but involved some challenges due to the limited amount of resources (i.e., datasets, published literature) available. We propose in this paper, a fake news detection model that use n-gram analysis and machine learning techniques. We investigate and compare two different features extraction techniques and six different machine classification techniques. Experimental evaluation yields the best performance using Term Frequency-Inverted Document Frequency (TF-IDF) as feature extraction technique, and Linear Support Vector Machine (LSVM) as a classifier, with an accuracy of 92%.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the detection of online fake news. Specifically, the authors focus on how to effectively distinguish real news from fake news by using n - gram analysis and machine - learning techniques. This problem has become particularly important in recent years, especially in the political field, because fake news has had a significant impact on social life, especially in the political world. Due to the limited available resources (such as data sets, published literature), fake news detection has become a challenging emerging research area. To meet this challenge, the authors propose a fake - news - detection model based on n - gram analysis and machine - learning techniques. They explore and compare two different feature - extraction techniques and six different machine - classification techniques, and finally find that when using term frequency - inverse document frequency (TF - IDF) as the feature - extraction technique and linear support vector machine (LSVM) as the classifier, the model performs best, with an accuracy rate of 92%. This shows that by rationally choosing the feature - extraction method and classification algorithm, the accuracy of fake - news detection can be effectively improved.