Abstract:The advancement in technology made a significant mark with time, which affects every field of life like medicine, music, office, traveling, and communication. Telephone lines are used as a communication medium in ancient times. Currently, wireless technology overrides telephone wire technology with much broader features. The advertisement agencies and spammers mostly use SMS as a medium of communication to convey their business brochures to the typical person. Due to this reason, more than 60% of spam SMS are received daily. These spam messages cause users' anger and sometimes scam with innocent users, but it creates large profits for the spammer and advertisement companies. This study proposed an approach for the classification of spam and ham SMS using supervised machine learning techniques. The feature extracting techniques such as Term Frequency-Inverse Document Frequency (TF-IDF) and bag-of-words are used to extract features from data. The SMS dataset used was imbalanced, and to solve this problem, we used over-sampling and under-sampling techniques. The support vector classifier, gradient boosting machine, random forest, Gaussian Naive Bayes, and logistics regression are applied on the spam and ham SMS dataset to evaluate the performance using accuracy, precision, recall, and F1 score. The experiment result shows that the random forest classifies spam ham SMS more accurately with 99% accuracy. The proposed model is trained well to identify the SMS category in terms of Ham or Spam with TF-IDF features and oversampling technique. The performance of the proposed approach was also evaluated on the spam email dataset with significant 99% accuracy.

A Support Vector Machine Based Naive Bayes Algorithm for Spam Filtering

Classify E-mails by Support Vector Machine

Spam Message Self-Adaptive Filtering System Based on Naive Bayes and Support Vector Machine

Classifying e-mails via support vector machine

Attention Mechanism and Support Vector Machine for Image-Based E-Mail Spam Filtering

Application of Natural Language Processing and Machine Learning Boosted with Swarm Intelligence for Spam Email Filtering

Combining Svm With Orthogonal Centroid Feature Selection For Spam Filtering

Content-based Spam Email Detection Using N-gram Machine Learning Approach

Spam SMS filtering based on text features and supervised machine learning techniques

A Spam Filtering Method Based on Bayesian Neural Network

Intelligent Detection Approaches for Spam

Novel method for Chinese spam detection based on one-class support vector machines

A Late Multi-Modal Fusion Model for Detecting Hybrid Spam E-mail

Classification of Spam Emails through Hierarchical Clustering and Supervised Learning

Training SVM Email Classifiers Using Very Large Imbalanced Dataset

Email Classification Using Behavior and Time Features

Machine intelligence based hybrid classifier for spam detection and sentiment analysis of SMS messages

A Hybrid Model with New Word Weighting for Fast Filtering Spam Short Texts

PRIS Kidult Anti-SPAM Solution at the TREC 2005 Spam Track: Improving the Performance of Naive Bayes for Spam Detection.

Precision in Classification: A Comparative Study of Logistic Regression, Naive Bayes, LSTM, and CNN for Spam Email Detection

Effective spam filter based on a hybrid method of header checking and content parsing