Abstract:Can we trust Large Language Models (LLMs) to accurately predict scam? This paper investigates the vulnerabilities of LLMs when facing adversarial scam messages for the task of scam detection. We addressed this issue by creating a comprehensive dataset with fine-grained labels of scam messages, including both original and adversarial scam messages. The dataset extended traditional binary classes for the scam detection task into more nuanced scam types. Our analysis showed how adversarial examples took advantage of vulnerabilities of a LLM, leading to high misclassification rate. We evaluated the performance of LLMs on these adversarial scam messages and proposed strategies to improve their robustness.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the vulnerability of large - language models (LLMs) when facing adversarial fraud information. Specifically, the paper explores the performance degradation of LLMs in detecting adversarial fraud information and evaluates the vulnerability of these models by creating a comprehensive dataset containing original and adversarial fraud information. The research aims to reveal how adversarial examples exploit the weaknesses of LLMs, leading to a high misclassification rate, and proposes improvement strategies to enhance the robustness of the models. ### Main Contributions 1. **Creation of a Comprehensive Fraud Dataset**: - Developed a meticulously labeled fraud information dataset, including original and adversarial fraud information. - The dataset extends traditional binary classification and introduces more detailed fraud - type labels. 2. **Identification of LLM Vulnerability to Adversarial Examples**: - Used the labeled dataset to evaluate the robustness of models under low - sample learning (zero - sample and few - sample) conditions. - Compared the accuracy of models on original and adversarial fraud information, revealing the degree of misclassification. 3. **Evaluation of LLM Performance in Fraud Information Detection**: - Explored strategies for adversarial data augmentation, and the results showed that specific adversarial prompt techniques can mitigate the impact of such attacks. ### Experimental Methods - **Dataset Generation**: - Compiled a dataset of approximately 1,200 messages, including original fraud information (530 messages), adversarially modified fraud information (126 messages), and non - fraud information (544 messages). - Generated adversarial examples by removing common fraud indicators, adjusting tone and language, and retaining key information. - **Experimental Setup**: - Used two datasets: general fraud information and adversarial fraud information. - Tested three LLM models: GPT - 3.5, Claude3 - haiku, and LLaMA 3.1 8B Instruct. - Compared the fraud - detection performance under different prompt settings (zero - sample and few - sample). ### Experimental Results - **Performance Comparison**: - GPT - 3.5 Turbo performed best in all categories, showing higher resilience to adversarial modification. - LLaMA 3.1 8B Instruct performed the weakest, especially in the face of adversarial fraud information. - Comparing different categories, romantic fraud information was the most sensitive to adversarial modification. - **Case Study**: - Through a specific case, showed how to modify original fraud information into an adversarial version to bypass detection. - The modified information added location information, adjusted the interest rate range and grace period, made the tone more formal, and updated the contact information, thus misleading the LLM. ### Conclusions - The research shows that even small modifications can significantly reduce the accuracy of LLMs. - Proposed strategies such as adding adversarial prompts and using few - sample learning to improve the robustness of LLMs. - Emphasized the need to continuously improve the training methods of LLMs to build more powerful fraud - detection systems.

Exposing LLM Vulnerabilities: Adversarial Scam Detection and Performance

Detecting Scams Using Large Language Models

Can LLMs be Scammed? A Baseline Measurement Study

Can LLMs be Fooled? Investigating Vulnerabilities in LLMs

Combating Phone Scams with LLM-based Detection: Where Do We Stand?

Assessing Hidden Risks of LLMs: An Empirical Study on Robustness, Consistency, and Credibility

Assessing Adversarial Robustness of Large Language Models: An Empirical Study

Exploring the Adversarial Capabilities of Large Language Models

Securing Large Language Models: Addressing Bias, Misinformation, and Prompt Attacks

Large Language Model Sentinel: Advancing Adversarial Robustness by LLM Agent

Are You Human? An Adversarial Benchmark to Expose LLMs

Large Language Model Sentinel: LLM Agent for Adversarial Purification

What Does the Bot Say? Opportunities and Risks of Large Language Models in Social Media Bot Detection

Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks

Exploring Vulnerabilities and Threats in Large Language Models: Safeguarding Against Exploitation and Misuse

Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities

A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly

Exploring Vulnerabilities and Protections in Large Language Models: A Survey

Red Teaming Language Model Detectors with Language Models

Adversarial Attacks on Large Language Models in Medicine