Abstract:The advancement in technology made a significant mark with time, which affects every field of life like medicine, music, office, traveling, and communication. Telephone lines are used as a communication medium in ancient times. Currently, wireless technology overrides telephone wire technology with much broader features. The advertisement agencies and spammers mostly use SMS as a medium of communication to convey their business brochures to the typical person. Due to this reason, more than 60% of spam SMS are received daily. These spam messages cause users' anger and sometimes scam with innocent users, but it creates large profits for the spammer and advertisement companies. This study proposed an approach for the classification of spam and ham SMS using supervised machine learning techniques. The feature extracting techniques such as Term Frequency-Inverse Document Frequency (TF-IDF) and bag-of-words are used to extract features from data. The SMS dataset used was imbalanced, and to solve this problem, we used over-sampling and under-sampling techniques. The support vector classifier, gradient boosting machine, random forest, Gaussian Naive Bayes, and logistics regression are applied on the spam and ham SMS dataset to evaluate the performance using accuracy, precision, recall, and F1 score. The experiment result shows that the random forest classifies spam ham SMS more accurately with 99% accuracy. The proposed model is trained well to identify the SMS category in terms of Ham or Spam with TF-IDF features and oversampling technique. The performance of the proposed approach was also evaluated on the spam email dataset with significant 99% accuracy.

Spam Filtering Based on Latent Semantic Indexing

Largemargin Classification for Combating Disguise Attacks on Spam Filters

TopicSpam: a Topic-Model Based Approach for Spam Detection.

A Local-Concentration-Based Feature Extraction Approach for Spam Filtering.

Next-Generation Spam Filtering: Comparative Fine-Tuning of LLMs, NLPs, and CNN Models for Email Spam Classification

Application of Natural Language Processing and Machine Learning Boosted with Swarm Intelligence for Spam Email Filtering

Evading obscure communication from spam emails

Extracting discriminative information from e-mail for spam detection inspired by Immune System

Effective spam filter based on a hybrid method of header checking and content parsing

Deep learning to filter SMS Spam

Intelligent Detection Approaches for Spam

A semantic-based model with a hybrid feature engineering process for accurate spam detection

Concentration Based Feature Construction Approach for Spam Detection.

Training SpamAssassin with Active Semi-supervised Learning

Semantic Graph Based Convolutional Neural Network for Spam e-mail Classification in Cybercrime Applications

Variable Length Concentration Based Feature Construction Method for Spam Detection

Voting for Deceptive Opinion Spam Detection

Spam SMS filtering based on text features and supervised machine learning techniques

Combining Svm With Orthogonal Centroid Feature Selection For Spam Filtering

Detecting Spam E-mails with Content and Weight-based Binomial Logistic Model

Content-Based Spam Filtering on Video Sharing Social Networks