What problem does this paper attempt to address?

The problem that this paper attempts to solve is the "Jailbreak Attacks" problem of large language models (LLMs) in a multilingual environment. Specifically, researchers are concerned with how to circumvent the security filtering mechanisms of LLMs by translating malicious questions into different languages, thereby generating prohibited content. This phenomenon is particularly concerning in a multilingual environment because most existing security mechanisms are mainly designed for English and lack support and protection for other languages. ### Core Problems of the Paper 1. **Evaluation of the Effectiveness of Multilingual Jailbreak Attacks**: How do different LLMs perform when facing multilingual jailbreak attacks? Can they effectively identify and prevent these attacks? 2. **Analysis of Differences in Defense Mechanisms**: What are the differences in the defense mechanisms of LLMs in different languages? Are there cases where certain languages are more vulnerable to attacks? 3. **Research on Mitigation Strategies**: How can the defense capabilities of LLMs against multilingual jailbreak attacks be effectively enhanced? ### Main Contributions - **Automated Multilingual Dataset Generation**: A semantic - preserving algorithm is proposed to automatically create a malicious question dataset covering nine different languages. - **Comprehensive Evaluation**: Multiple LLMs are evaluated for multilingual jailbreak attacks, covering different languages, model types, and prohibited scenarios. - **Interpretability Analysis**: Through techniques such as attention visualization and representation analysis, the behavior patterns of LLMs when processing multilingual inputs are deeply explored. - **Jailbreak Mitigation Method**: A fine - tuning method is developed and implemented, which significantly improves the defense capabilities of the model and reduces the attack success rate by 96.2%. ### Research Background and Motivation With the wide application of LLMs, their security issues have become increasingly prominent. In particular, "Jailbreak Attacks", that is, bypassing the security mechanisms of LLMs through carefully designed input prompts to make them generate inappropriate or harmful content, has become an important security challenge. Jailbreak attacks in a multilingual environment are an even weaker link in current research because most security measures are designed for English and lack support for other languages. ### Method Overview 1. **Dataset Construction**: Use a semantic - preserving algorithm to translate the original English malicious questions into nine different languages, and ensure the translation quality through similarity filtering. 2. **Evaluation and Analysis**: Use the generated dataset to test multiple LLMs and evaluate their performance in different languages and scenarios. 3. **Mitigation Strategy**: Enhance the defense capabilities of LLMs through a fine - tuning method to reduce the success rate of jailbreak attacks. ### Conclusion This research not only reveals the jailbreak attack risks faced by LLMs in a multilingual environment but also provides valuable insights and solutions for improving the security and reliability of these models. This helps promote the safe application of LLMs in a wider range of multilingual application scenarios.

A Cross-Language Investigation into Jailbreak Attacks in Large Language Models

Jailbreak Attacks and Defenses Against Large Language Models: A Survey

A Comprehensive Study of Jailbreak Attack versus Defense for Large Language Models

JailBreakV: A Benchmark for Assessing the Robustness of MultiModal Large Language Models against Jailbreak Attacks

Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks

Jailbreaking and Mitigation of Vulnerabilities in Large Language Models

Figure it Out: Analyzing-based Jailbreak Attack on Large Language Models

Distract Large Language Models for Automatic Jailbreak Attack

Comprehensive Assessment of Jailbreak Attacks Against LLMs

Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise and Reconstruction

Multilingual Jailbreak Challenges in Large Language Models

Investigating Coverage Criteria in Large Language Models: An In-Depth Study Through Jailbreak Attacks

Improved Large Language Model Jailbreak Detection via Pretrained Embeddings

PathSeeker: Exploring LLM Security Vulnerabilities with a Reinforcement Learning-Based Jailbreak Approach

Harnessing Task Overload for Scalable Jailbreak Attacks on Large Language Models

Playing Language Game with LLMs Leads to Jailbreaking

Jailbreaking LLMs with Arabic Transliteration and Arabizi

Jailbreaking Attack against Multimodal Large Language Model

Model-Editing-Based Jailbreak against Safety-aligned Large Language Models

EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models

Stealthy Jailbreak Attacks on Large Language Models via Benign Data Mirroring