Abstract:Every year, phishing results in losses of billions of dollars and is a major threat to the Internet economy. Phishing attacks are now most often carried out by email. To better comprehend the existing research trend of phishing email detection, several review studies have been performed. However, it is important to assess this issue from different perspectives. None of the surveys have ever comprehensively studied the use of Natural Language Processing (NLP) techniques for detection of phishing except one that shed light on the use of NLP techniques for classification and training purposes, while exploring a few alternatives. To bridge the gap, this study aims to systematically review and synthesise research on the use of NLP for detecting phishing emails. Based on specific predefined criteria, a total of 100 research articles published between 2006 and 2022 were identified and analysed. We study the key research areas in phishing email detection using NLP, machine learning algorithms used in phishing detection email, text features in phishing emails, datasets and resources that have been used in phishing emails, and the evaluation criteria. The findings include that the main research area in phishing detection studies is feature extraction and selection, followed by methods for classifying and optimizing the detection of phishing emails. Amongst the range of classification algorithms, support vector machines (SVMs) are heavily utilised for detecting phishing emails. The most frequently used NLP techniques are found to be TF-IDF and word embeddings. Furthermore, the most commonly used datasets for benchmarking phishing email detection methods is the Nazario phishing corpus. Also, Python is the most commonly used one for phishing email detection. It is expected that the findings of this paper can be helpful for the scientific community, especially in the field of NLP application in cybersecurity problems. This survey also is unique in the sense that it relates works- to their openly available tools and resources. The analysis of the presented works revealed that not much work had been performed on Arabic language phishing emails using NLP techniques. Therefore, many open issues are associated with Arabic phishing email detection.

Applying machine learning and natural language processing to detect phishing email

An intelligent cyber security phishing detection system using deep learning techniques

Phishing Email Detection Model Using Deep Learning

Phishing Attacks Detection -- A Machine Learning-Based Approach

Phishing email detection using deep learning algorithms

Intelligent Deep Learning Based Cybersecurity Phishing Email Detection and Classification

A Systematic Review on Deep-Learning-Based Phishing Email Detection

Phishing Email Detection Using Natural Language Processing Techniques: A Literature Survey

Advancing Phishing Email Detection: A Comparative Study of Deep Learning Models

Comparative evaluation of machine learning algorithms for phishing site detection

A Classifier Model to Detect Phishing Emails Using Ensemble Technique

A Systematic Literature Review on Phishing Email Detection Using Natural Language Processing Techniques

A Systematic Review of Deep Learning Techniques for Phishing Email Detection

Cloud-based email phishing attack using machine and deep learning algorithm

A hybrid DNN-LSTM model for detecting phishing URLs

Applications of deep learning for phishing detection: a systematic literature review

A Deep Learning-Based Phishing Detection System Using CNN, LSTM, and LSTM-CNN

An investigation into the performances of the Current state-of-the-art Naive Bayes, Non-Bayesian and Deep Learning Based Classifier for Phishing Detection: A Survey

Enhancing Phishing Detection: A Novel Hybrid Deep Learning Framework for Cybercrime Forensics

Phishing Email Detection Using Inputs From Artificial Intelligence

Next-Generation Phishing: How LLM Agents Empower Cyber Attackers