Robustness Evaluation of Cloud-Deployed Large Language Models Against Chinese Adversarial Text Attacks

Yunting Zhang,Lin Ye,Baisong Li,Hongli Zhang
DOI: https://doi.org/10.1109/cloudnet59005.2023.10490064
2023-01-01
Abstract:In the evolving digital realm, Large Language Models (LLMs) like ChatGPT, which recently achieved state-of-the-art results across diverse NLP tasks, are extensively used. Deployed on the cloud, ChatGPT allows interaction via its API, providing rich and high-quality solutions. However, its vulnerability to adversarial attacks, potentially compromising the quality and reliability of cloud services and leading to information leakage, raises security concerns. Investigating the robustness of ChatGPT against adversarial attacks enables a preliminary understanding of its weaknesses and facilitates the subsequent integration of targeted defensive mechanisms into the cloud framework. Most current research on the robustness of LLMs against adversarial attacks focuses on BERT, with few studies on ChatGPT under similar conditions. This paper explores the robustness of ChatGPT against Chinese adversarial text attacks in text classification tasks and proposes a ChatGPT-based adversarial text fluency evaluation method that eliminates the need for human involvement. Experiments conducted on the real-world dataset, THUCNews, examined the robustness of Chinese BERT and ChatGPT against adversarial attacks generated via various Chinese adversarial text generation methods. A multidimensional assessment revealed that both models are susceptible to attacks, leading to decreased text classification accuracy. The attack success rate on ChatGPT reached nearly 45%.
What problem does this paper attempt to address?