Evaluating Large Language Models' Capability to Launch Fully Automated Spear Phishing Campaigns: Validated on Human Subjects

Fred Heiding,Simon Lermen,Andrew Kao,Bruce Schneier,Arun Vishwanath
2024-12-01
Abstract:In this paper, we evaluate the capability of large language models to conduct personalized phishing attacks and compare their performance with human experts and AI models from last year. We include four email groups with a combined total of 101 participants: A control group of arbitrary phishing emails, which received a click-through rate (recipient pressed a link in the email) of 12%, emails generated by human experts (54% click-through), fully AI-automated emails 54% (click-through), and AI emails utilizing a human-in-the-loop (56% click-through). Thus, the AI-automated attacks performed on par with human experts and 350% better than the control group. The results are a significant improvement from similar studies conducted last year, highlighting the increased deceptive capabilities of AI models. Our AI-automated emails were sent using a custom-built tool that automates the entire spear phishing process, including information gathering and creating personalized vulnerability profiles for each target. The AI-gathered information was accurate and useful in 88% of cases and only produced inaccurate profiles for 4% of the participants. We also use language models to detect the intention of emails. Claude 3.5 Sonnet scored well above 90% with low false-positive rates and detected several seemingly benign emails that passed human detection. Lastly, we analyze the economics of phishing, highlighting how AI enables attackers to target more individuals at lower cost and increase profitability by up to 50 times for larger audiences.
Cryptography and Security
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to evaluate the capabilities of large - language models (LLMs) in conducting personalized phishing attacks and compare their performance with human experts and last year's AI models. Specifically, the researchers hope to answer this question through the following aspects: 1. **Evaluating the deception ability of AI models**: - The researchers verified four different types of emails through experiments: the control group (randomly selected phishing emails), phishing emails generated by human experts, phishing emails fully automatically generated by AI, and AI - generated phishing emails with human assistance. - The experimental results show that the click - through rate of AI - automatically - generated phishing emails reached 54%, which is comparable to that of phishing emails generated by human experts, while the click - through rate of AI - generated phishing emails with human assistance is 56%. 2. **Increasing the degree of automation**: - The researchers developed a custom tool that can automate the entire phishing process, including information collection, creating personalized vulnerability profiles, generating and sending phishing emails, and evaluating the success rate of attack strategies. - The information collected by the tool is accurate and useful in 88% of cases, and only 4% of participants are misrepresented. 3. **Detecting intentions**: - The researchers used five popular large - language models (such as Claude 3.5 Sonnet, GPT - 4o, etc.) to detect the intentions of emails, especially to identify potential phishing emails. - Among them, Claude 3.5 Sonnet performs particularly well, being able to detect phishing emails with an accuracy rate of over 90% and a very low false - positive rate. 4. **Economic analysis**: - The study also analyzed the impact of AI on the economic benefits of phishing, pointing out that AI enables attackers to target more people at a lower cost, thereby increasing profitability by 50 times. ### Main contributions 1. **Creating an evaluation benchmark**: - The researchers created a benchmark for evaluating the ability of AI to automatically generate phishing emails and compared it with the research results of last year. 2. **Automating the entire phishing process**: - Demonstrated how to use large - language models to automate all steps of phishing attacks, not just email generation. 3. **Improving detection methods**: - Provided a simple and efficient method for detecting phishing emails, improving the detection effect by guiding the model to suspect the authenticity of the email without increasing the false - positive rate. 4. **Economic impact analysis**: - Provided a detailed economic analysis, showing a significant increase in the incentives for attackers in AI - enhanced phishing attacks. ### Conclusion The research results show that large - language models already have the ability comparable to human experts in generating phishing emails, and in some cases even exceed human experts. In addition, the progress of AI technology has significantly reduced the cost of phishing attacks and significantly increased the profits of attackers. Therefore, the researchers call on researchers, policy - makers, and technology practitioners to attach great importance to the seriousness of AI - enhanced phishing attacks and adopt new defense strategies and technical means to deal with them.