Abstract:In recent years, spammers are now trying to obfuscate their intents by introducing hybrid spam e-mail combining both image and text parts, which is more challenging to detect in comparison to e-mails containing text or image only. The motivation behind this research is to design an effective approach filtering out hybrid spam e-mails to avoid situations where traditional text-based or image-baesd only filters fail to detect hybrid spam e-mails. To the best of our knowledge, a few studies have been conducted with the goal of detecting hybrid spam e-mails. Ordinarily, Optical Character Recognition (OCR) technology is used to eliminate the image parts of spam by transforming images into text. However, the research questions are that although OCR scanning is a very successful technique in processing text-and-image hybrid spam, it is not an effective solution for dealing with huge quantities due to the CPU power required and the execution time it takes to scan e-mail files. And the OCR techniques are not always reliable in the transformation processes. To address such problems, we propose new late multi-modal fusion training frameworks for a text-and-image hybrid spam e-mail filtering system compared to the classical early fusion detection frameworks based on the OCR method. Convolutional Neural Network (CNN) and Continuous Bag of Words were implemented to extract features from image and text parts of hybrid spam respectively, whereas generated features were fed to sigmoid layer and Machine Learning based classifiers including Random Forest (RF), Decision Tree (DT), Naive Bayes (NB) and Support Vector Machine (SVM) to determine the e-mail ham or spam.

Combining Naive Bayes and tri-gram language model for spam filtering

Camouflaged Chinese Spam Content Detection with Semi-supervised Generative Active Learning.

Spam-T5: Benchmarking Large Language Models for Few-Shot Email Spam Detection

A Spam Filtering Method Based on Multi-Modal Fusion

Research on Advanced Filtering Algorithm for Spam Email Based on Bayes Parameter Estimation

PRIS Kidult Anti-SPAM Solution at the TREC 2005 Spam Track: Improving the Performance of Naive Bayes for Spam Detection.

CONTINUOUS BAYESIAN SPAM FILTERING BASED ON CONTENT

Content-based Spam Email Detection Using N-gram Machine Learning Approach

Simplified Chinese spam mail filter:design and performance evaluation

A Late Multi-Modal Fusion Model for Detecting Hybrid Spam E-mail

A Spam Filtering Method Based on Bayesian Neural Network

T-Bert: A Spam Review Detection Model Combining Group Intelligence and Personalized Sentiment Information

Research on Behavior Statistic Based Spam Filter

Combining Neural Networks and Semantic Feature Space for Email Classification

Research On Advanced Filtering Algorithm For Anti-Spam Based On A Bayesian Classification Model

FBS-TGCN: a temporal graph-convolutional-network model for spatiotemporal prediction of spam messages from fake base stations

Evaluating the Performance of ChatGPT for Spam Email Detection

The Improved Logistic Regression Models for Spam Filtering

A Hybrid Model with New Word Weighting for Fast Filtering Spam Short Texts

An Adaptive Concentration Selection Model for Spam Detection.

Application of Natural Language Processing and Machine Learning Boosted with Swarm Intelligence for Spam Email Filtering