Albanian fake news detection

Ercan Canhasi,Rexhep Shijaku,Erblin Berisha
DOI: https://doi.org/10.1145/3487288
2022-02-03
Abstract:Recent years have witnessed the vast increase of the phenomenon known as the fake news. Among the main reasons for this increase are the continuous growth of internet and social media usage and the real-time information dissemination opportunity offered by them. Deceiving, misleading content, such as the fake news, especially the one made by and for social media users, is becoming eminently hazardous. Hence, the fake news detection problem has become an important research topic. Despite the recent advances in fake news detection, the lack of fake news corpora for the under-resourced languages are compromising the development and the evaluation of existing approaches on these languages. To fill this huge gap, in this paper we investigate the issue of fake news detection for the Albanian language. In it, we present a new public data set of labeled true and fake news in Albanian, and perform an extensive analysis of machine learning methods for fake news detection. We performed a comprehensive feature engineering and feature selection experiments. In doing so, we explored the Albanian language related feature categories such as the lexical, syntactic, lying-detection, and psycho-linguistic features. Each article was also modeled in four different ways: with the traditional bag-of-words (BoW) and with three distributed text representations using the state-of-the-art Word2Vec, FastText and BERT methods. Additionally, we investigated the best combination of features and various types of classification methods. The conducted experiments and obtained results from evaluations are finally used to draw some conclusions. They shed light on the potentiality of the methods and the challenges that the Albanian fake news detection presents.
computer science, artificial intelligence
What problem does this paper attempt to address?