Abstract:International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, Volume 32, Issue 01, Page 1-20, January 2024. Currently, social media networks such as Facebook and Twitter have evolved into valuable platforms for global communication. However, due to their extensive user bases, Twitter is often misused by illegitimate users engaging in illicit activities. While there are numerous research papers available that delve into combating illegitimate users on Twitter, a common shortcoming in most of these works is the failure to address the issue of class imbalance, which significantly impacts the effectiveness of spam detection. Few other research works that have addressed class imbalance have not yet applied bio-inspired algorithms to balance the dataset. Therefore, we introduce PSOB-U, a particle swarm optimization-based undersampling technique designed to balance the Twitter dataset. In PSOB-U, various classifiers and metrics are employed to select majority samples and rank them. Furthermore, an ensemble learning approach is implemented to combine the base classifiers in three stages. During the training phase of the base classifiers, undersampling techniques and a cost-sensitive random forest (CS-RF) are utilized to address the imbalanced data at both the data and algorithmic levels. In the first stage, imbalanced datasets are balanced using random undersampling, particle swarm optimization-based undersampling, and random oversampling. In the second stage, a classifier is constructed for each of the balanced datasets obtained through these sampling techniques. In the third stage, a majority voting method is introduced to aggregate the predicted outputs from the three classifiers. The evaluation results demonstrate that our proposed method significantly enhances the detection of illegitimate users in the imbalanced Twitter dataset. Additionally, we compare our proposed work with existing models, and the predicted results highlight the superiority of our spam detection model over state-of-the-art spam detection models that address the class imbalance problem. The combination of particle swarm optimization-based undersampling and the ensemble learning approach using majority voting results in more accurate spam detection.

Building an Effective Email Spam Classification Model with spaCy

Semantic Graph Based Convolutional Neural Network for Spam e-mail Classification in Cybercrime Applications

Bio-Inspired Algorithm Based Undersampling Approach and Ensemble Learning for Twitter Spam Detection

TopicSpam: a Topic-Model Based Approach for Spam Detection.

Email Spam Detection using Deep Learning Approach

Application of Natural Language Processing and Machine Learning Boosted with Swarm Intelligence for Spam Email Filtering

A Comprehensive Review on Email Spam Classification using Machine Learning Algorithms

Content-based Spam Email Detection Using N-gram Machine Learning Approach

Classification of Spam Emails through Hierarchical Clustering and Supervised Learning

Investigating the Effectiveness of Bayesian Spam Filters in Detecting LLM-modified Spam Mails

Spam-T5: Benchmarking Large Language Models for Few-Shot Email Spam Detection

An Optimized Approach for Detection and Classification of Spam Email's Using Ensemble Methods

Email spam detection by deep learning models using novel feature selection technique and BERT

Next-Generation Spam Filtering: Comparative Fine-Tuning of LLMs, NLPs, and CNN Models for Email Spam Classification

Evaluating the Performance of ChatGPT for Spam Email Detection

A Spam Filtering Method Based on Multi-Modal Fusion

Precision in Classification: A Comparative Study of Logistic Regression, Naive Bayes, LSTM, and CNN for Spam Email Detection

A semantic-based model with a hybrid feature engineering process for accurate spam detection

Machine Learning Techniques for Spam Detection in Email and IoT Platforms: Analysis and Research Challenges

Classifying spam emails using agglomerative hierarchical clustering and a topic-based approach

Machine learning for email spam filtering: review, approaches and open research problems