Abstract:The escalating threat of phishing emails has become increasingly sophisticated with the rise of Large Language Models (LLMs). As attackers exploit LLMs to craft more convincing and evasive phishing emails, it is crucial to assess the resilience of current phishing defenses. In this study we conduct a comprehensive evaluation of traditional phishing detectors, such as Gmail Spam Filter, Apache SpamAssassin, and Proofpoint, as well as machine learning models like SVM, Logistic Regression, and Naive Bayes, in identifying both traditional and LLM-rephrased phishing emails. We also explore the emerging role of LLMs as phishing detection tools, a method already adopted by companies like NTT Security Holdings and JPMorgan Chase. Our results reveal notable declines in detection accuracy for rephrased emails across all detectors, highlighting critical weaknesses in current phishing defenses. As the threat landscape evolves, our findings underscore the need for stronger security controls and regulatory oversight on LLM-generated content to prevent its misuse in creating advanced phishing attacks. This study contributes to the development of more effective Cyber Threat Intelligence (CTI) by leveraging LLMs to generate diverse phishing variants that can be used for data augmentation, harnessing the power of LLMs to enhance phishing detection, and paving the way for more robust and adaptable threat detection systems.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the challenges posed by phishing emails generated by large - language models (LLMs) to existing detection systems. With the progress of LLM technology, attackers can use these models to generate more realistic and undetectable phishing emails, thus bypassing traditional anti - phishing detection tools. Specifically: 1. **Evaluating the effectiveness of traditional detection tools**: - Researchers conducted a comprehensive evaluation of existing anti - phishing detection tools (such as Gmail spam filters, Apache SpamAssassin, Proofpoint, etc.) and machine - learning models (such as SVM, logistic regression, Naive Bayes, etc.), testing their performance in identifying traditional phishing emails and phishing emails rewritten by LLMs. 2. **Exploring the application of LLMs as detection tools**: - The paper also explored whether LLMs can be used as new phishing - email detection tools and analyzed the effects of several companies (such as NTT Security Holdings and JPMorgan Chase) that have already adopted this method. 3. **Revealing the weaknesses of current defense mechanisms**: - The research results show that the detection accuracy of all detection tools drops significantly when facing phishing emails rewritten by LLMs, which exposes the key weaknesses of existing defense mechanisms. 4. **Proposing improvement measures**: - The paper emphasizes the need for stronger security controls and regulatory measures to prevent LLMs from being misused to create advanced phishing attacks. At the same time, the research also proposed methods to improve detection capabilities through data augmentation and improving training data to cope with the increasingly complex threat environment. 5. **Promoting the development of more effective cyber - security intelligence (CTI)**: - Ultimately, the research provides support for the development of more effective cyber - security intelligence systems, especially by using LLMs to generate diverse phishing - email variants for data augmentation, thereby enhancing the robustness and adaptability of detection systems. Through these efforts, researchers hope to improve the prevention ability against phishing emails generated by LLMs and ensure that the network - security defense system can keep up with the pace of technological development.

Next-Generation Phishing: How LLM Agents Empower Cyber Attackers

Large Language Model Lateral Spear Phishing: A Comparative Study in Large-Scale Organizational Settings

ChatSpamDetector: Leveraging Large Language Models for Effective Phishing Email Detection

Evaluating the Efficacy of Large Language Models in Identifying Phishing Attempts

From ML to LLM: Evaluating the Robustness of Phishing Webpage Detection Models against Adversarial Attacks

SecureNet: A Comparative Study of DeBERTa and Large Language Models for Phishing Detection

When LLMs Go Online: The Emerging Threat of Web-Enabled LLMs

ChatPhishDetector: Detecting Phishing Sites Using Large Language Models

SpearBot: Leveraging Large Language Models in a Generative-Critique Framework for Spear-Phishing Email Generation

Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models

From Chatbots to PhishBots? -- Preventing Phishing scams created using ChatGPT, Google Bard and Claude

Advancing Phishing Email Detection: A Comparative Study of Deep Learning Models

Devising and Detecting Phishing Emails Using Large Language Models

A Systematic Review on Deep-Learning-Based Phishing Email Detection

Automated Phishing Detection Using URLs and Webpages

Assessing AI vs Human-Authored Spear Phishing SMS Attacks: An Empirical Study Using the TRAPD Method

A Systematic Review of Deep Learning Techniques for Phishing Email Detection

Targeted Phishing Campaigns using Large Scale Language Models

An Improved Transformer-based Model for Detecting Phishing, Spam, and Ham: A Large Language Model Approach

A Survey of Large Language Models for Cyber Threat Detection

Multimodal Large Language Models for Phishing Webpage Detection and Identification