A hybrid classification method for Twitter spam detection based on differential evolution and random forest

Sepideh Bazzaz Abkenar,Ebrahim Mahdipour,Seyed Mahdi Jameii,Mostafa Haghi Kashani
DOI: https://doi.org/10.1002/cpe.6381
2021-06-04
Concurrency and Computation: Practice and Experience
Abstract:<p>Social networking services are online platforms that are distributed across different computers over long distances. Twitter is the most popular microblogging site that allows users to share their opinions and real-world events. Due to its popularity and ease of use, Twitter has also attracted spammers. As a result, spam detection is one of the most critical problems. In order to provide a spam-free environment, it is necessary to identify and filter spam tweets as well as their owners. A hybrid method, which is based on Synthetic Minority Over-sampling TEchnique (SMOTE) and Differential Evolution (DE) strategies, is presented to enhance the spam detection rate in real Twitter datasets. SMOTE is applied to tackle the imbalanced class distribution of datasets, while DE is used to tune Random Forest (RF) hyperparameters. Compared with related work and based on evaluation results, the presented method significantly enhances the classification performance in imbalanced datasets. The detection rate of optimized RF with excellent <i>F</i><sub>1</sub>-score and Area Under the Receiver Operating Characteristic Curve (AUROC), which are 98.97% and 0.999, respectively, demonstrates the high efficiency of the proposed method.</p>
What problem does this paper attempt to address?