Evaluating the Performance of ChatGPT for Spam Email Detection

Shijing Si,Yuwei Wu,Le Tang,Yugui Zhang,Jedrek Wosik

2024-06-19

Abstract:Email continues to be a pivotal and extensively utilized communication medium within professional and commercial domains. Nonetheless, the prevalence of spam emails poses a significant challenge for users, disrupting their daily routines and diminishing productivity. Consequently, accurately identifying and filtering spam based on content has become crucial for cybersecurity. Recent advancements in natural language processing, particularly with large language models like ChatGPT, have shown remarkable performance in tasks such as question answering and text generation. However, its potential in spam identification remains underexplored. To fill in the gap, this study attempts to evaluate ChatGPT's capabilities for spam identification in both English and Chinese email datasets. We employ ChatGPT for spam email detection using in-context learning, which requires a prompt instruction and a few demonstrations. We also investigate how the number of demonstrations in the prompt affects the performance of ChatGPT. For comparison, we also implement five popular benchmark methods, including naive Bayes, support vector machines (SVM), logistic regression (LR), feedforward dense neural networks (DNN), and BERT classifiers. Through extensive experiments, the performance of ChatGPT is significantly worse than deep supervised learning methods in the large English dataset, while it presents superior performance on the low-resourced Chinese dataset.

Computation and Language,Artificial Intelligence,Computers and Society,Machine Learning

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to evaluate the performance of the large - language model ChatGPT in the spam detection task. Specifically, the researchers are concerned with whether ChatGPT can effectively identify spam through in - context learning, especially its performance on English and Chinese datasets. In addition, the study also explores the impact of the number of instances in the prompt on ChatGPT's performance and compares it with several popular benchmark methods, such as Naive Bayes, Support Vector Machine (SVM), Logistic Regression (LR), Feed - forward Dense Neural Network (DNN) and BERT classifier. The main purpose of the study is to fill the current research gap regarding ChatGPT in spam identification and explore its potential for application on datasets with fewer resources.

Evaluating the Performance of ChatGPT for Spam Email Detection

Camouflaged Chinese Spam Content Detection with Semi-supervised Generative Active Learning.

Chinese Spam Detection Based on Prompt Tuning

Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text

How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection

ChatGPT-Crawler: Find out if ChatGPT really knows what it's talking about

ChatSpamDetector: Leveraging Large Language Models for Effective Phishing Email Detection

To ChatGPT, or not to ChatGPT: That is the question!

A Preliminary Study of ChatGPT on News Recommendation: Personalization, Provider Fairness, Fake News

ChatGPT or Grammarly? Evaluating ChatGPT on Grammatical Error Correction Benchmark

Fighting Fire with Fire: Can ChatGPT Detect AI-generated Text?

Investigating the Effectiveness of Bayesian Spam Filters in Detecting LLM-modified Spam Mails

On the Generalization of Training-based ChatGPT Detection Methods

A Survey on the Real Power of ChatGPT

Building an Effective Email Spam Classification Model with spaCy

Is ChatGPT a Good NLG Evaluator? A Preliminary Study

Analyzing the Text Contents Produced by ChatGPT: Prompts, Feature-Components in Responses, and a Predictive Model

Spam-T5: Benchmarking Large Language Models for Few-Shot Email Spam Detection

Is ChatGPT a General-Purpose Natural Language Processing Task Solver?

FakeGPT: Fake News Generation, Explanation and Detection of Large Language Models