Exposing LLM Vulnerabilities: Adversarial Scam Detection and Performance

Chen-Wei Chang,Shailik Sarkar,Shutonu Mitra,Qi Zhang,Hossein Salemi,Hemant Purohit,Fengxiu Zhang,Michin Hong,Jin-Hee Cho,Chang-Tien Lu
2024-12-01
Abstract:Can we trust Large Language Models (LLMs) to accurately predict scam? This paper investigates the vulnerabilities of LLMs when facing adversarial scam messages for the task of scam detection. We addressed this issue by creating a comprehensive dataset with fine-grained labels of scam messages, including both original and adversarial scam messages. The dataset extended traditional binary classes for the scam detection task into more nuanced scam types. Our analysis showed how adversarial examples took advantage of vulnerabilities of a LLM, leading to high misclassification rate. We evaluated the performance of LLMs on these adversarial scam messages and proposed strategies to improve their robustness.
Cryptography and Security,Artificial Intelligence,Computers and Society
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the vulnerability of large - language models (LLMs) when facing adversarial fraud information. Specifically, the paper explores the performance degradation of LLMs in detecting adversarial fraud information and evaluates the vulnerability of these models by creating a comprehensive dataset containing original and adversarial fraud information. The research aims to reveal how adversarial examples exploit the weaknesses of LLMs, leading to a high misclassification rate, and proposes improvement strategies to enhance the robustness of the models. ### Main Contributions 1. **Creation of a Comprehensive Fraud Dataset**: - Developed a meticulously labeled fraud information dataset, including original and adversarial fraud information. - The dataset extends traditional binary classification and introduces more detailed fraud - type labels. 2. **Identification of LLM Vulnerability to Adversarial Examples**: - Used the labeled dataset to evaluate the robustness of models under low - sample learning (zero - sample and few - sample) conditions. - Compared the accuracy of models on original and adversarial fraud information, revealing the degree of misclassification. 3. **Evaluation of LLM Performance in Fraud Information Detection**: - Explored strategies for adversarial data augmentation, and the results showed that specific adversarial prompt techniques can mitigate the impact of such attacks. ### Experimental Methods - **Dataset Generation**: - Compiled a dataset of approximately 1,200 messages, including original fraud information (530 messages), adversarially modified fraud information (126 messages), and non - fraud information (544 messages). - Generated adversarial examples by removing common fraud indicators, adjusting tone and language, and retaining key information. - **Experimental Setup**: - Used two datasets: general fraud information and adversarial fraud information. - Tested three LLM models: GPT - 3.5, Claude3 - haiku, and LLaMA 3.1 8B Instruct. - Compared the fraud - detection performance under different prompt settings (zero - sample and few - sample). ### Experimental Results - **Performance Comparison**: - GPT - 3.5 Turbo performed best in all categories, showing higher resilience to adversarial modification. - LLaMA 3.1 8B Instruct performed the weakest, especially in the face of adversarial fraud information. - Comparing different categories, romantic fraud information was the most sensitive to adversarial modification. - **Case Study**: - Through a specific case, showed how to modify original fraud information into an adversarial version to bypass detection. - The modified information added location information, adjusted the interest rate range and grace period, made the tone more formal, and updated the contact information, thus misleading the LLM. ### Conclusions - The research shows that even small modifications can significantly reduce the accuracy of LLMs. - Proposed strategies such as adding adversarial prompts and using few - sample learning to improve the robustness of LLMs. - Emphasized the need to continuously improve the training methods of LLMs to build more powerful fraud - detection systems.