Next-Generation Spam Filtering: Comparative Fine-Tuning of LLMs, NLPs, and CNN Models for Email Spam Classification

Konstantinos I. Roumeliotis,Nikolaos D. Tselikas,Dimitrios K. Nasiopoulos

DOI: https://doi.org/10.3390/electronics13112034

IF: 2.9

2024-05-24

Electronics

Abstract:Spam emails and phishing attacks continue to pose significant challenges to email users worldwide, necessitating advanced techniques for their efficient detection and classification. In this paper, we address the persistent challenges of spam emails and phishing attacks by introducing a cutting-edge approach to email filtering. Our methodology revolves around harnessing the capabilities of advanced language models, particularly the state-of-the-art GPT-4 Large Language Model (LLM), along with BERT and RoBERTa Natural Language Processing (NLP) models. Through meticulous fine-tuning tailored for spam classification tasks, we aim to surpass the limitations of traditional spam detection systems, such as Convolutional Neural Networks (CNNs). Through an extensive literature review, experimentation, and evaluation, we demonstrate the effectiveness of our approach in accurately identifying spam and phishing emails while minimizing false positives. Our methodology showcases the potential of fine-tuning LLMs for specialized tasks like spam classification, offering enhanced protection against evolving spam and phishing attacks. This research contributes to the advancement of spam filtering techniques and lays the groundwork for robust email security systems in the face of increasingly sophisticated threats.

engineering, electrical & electronic,computer science, information systems,physics, applied

What problem does this paper attempt to address?

The main aim of this paper is to address the ongoing challenges posed by spam emails and phishing attacks. The authors propose an advanced approach to improve email filtering technology, which centers on utilizing the latest large language models (LLMs), natural language processing (NLP) models, and convolutional neural networks (CNNs) for spam classification. Specifically, the paper explores the following points: 1. **Adopted Models**: The paper focuses on the GPT-4 large language model, BERT, and RoBERTa NLP models, and compares them with traditional CNN models. 2. **Fine-tuning Strategies**: By meticulously fine-tuning these models for the spam classification task, the approach aims to surpass the limitations of traditional spam detection systems. 3. **Experimental Evaluation**: Through an in-depth review of existing literature, experimental operations, and evaluations, the proposed method is demonstrated to accurately identify spam and phishing emails while minimizing the false positive rate. 4. **Contributions and Objectives**: The objectives of the paper include evaluating and comparing the effectiveness of different models in the spam detection task, and providing in-depth analysis on specific research questions such as the predictive capabilities of different models, the importance of fine-tuning, and the generalization ability of the models. In summary, this paper focuses on introducing a novel approach that enhances the accuracy of spam detection by using advanced language models and techniques, thereby providing better protection for email users.

Next-Generation Spam Filtering: Comparative Fine-Tuning of LLMs, NLPs, and CNN Models for Email Spam Classification

Largemargin Classification for Combating Disguise Attacks on Spam Filters

Investigating the Effectiveness of Bayesian Spam Filters in Detecting LLM-modified Spam Mails

Next-Generation Phishing: How LLM Agents Empower Cyber Attackers

ChatSpamDetector: Leveraging Large Language Models for Effective Phishing Email Detection

Application of Natural Language Processing and Machine Learning Boosted with Swarm Intelligence for Spam Email Filtering

Spam-T5: Benchmarking Large Language Models for Few-Shot Email Spam Detection

Precision in Classification: A Comparative Study of Logistic Regression, Naive Bayes, LSTM, and CNN for Spam Email Detection

Email spam detection by deep learning models using novel feature selection technique and BERT

Advancing Phishing Email Detection: A Comparative Study of Deep Learning Models

An Improved Transformer-based Model for Detecting Phishing, Spam, and Ham: A Large Language Model Approach

A Systematic Review on Deep-Learning-Based Phishing Email Detection

Semantic Graph Based Convolutional Neural Network for Spam e-mail Classification in Cybercrime Applications

Spam Filtering Based on Latent Semantic Indexing

Filtering and Detection of Real-Time Spam Mail Based on a Bayesian Approach in University Networks

Email Spam Detection using Deep Learning Approach

A Comprehensive Survey for Intelligent Spam Email Detection

Evaluating the Efficacy of Large Language Models in Identifying Phishing Attempts

Targeted Phishing Campaigns using Large Scale Language Models

Development of a Machine Learning Model for Image-based Email Spam Detection

Evading obscure communication from spam emails