Abstract:As generative AI, particularly large language models (LLMs), become increasingly integrated into production applications, new attack surfaces and vulnerabilities emerge and put a focus on adversarial threats in natural language and multi-modal systems. Red-teaming has gained importance in proactively identifying weaknesses in these systems, while blue-teaming works to protect against such adversarial attacks. Despite growing academic interest in adversarial risks for generative AI, there is limited guidance tailored for practitioners to assess and mitigate these challenges in real-world environments. To address this, our contributions include: (1) a practical examination of red- and blue-teaming strategies for securing generative AI, (2) identification of key challenges and open questions in defense development and evaluation, and (3) the Attack Atlas, an intuitive framework that brings a practical approach to analyzing single-turn input attacks, placing it at the forefront for practitioners. This work aims to bridge the gap between academic insights and practical security measures for the protection of generative AI systems.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to address the security challenges and adversarial threats faced by generative artificial intelligence (Generative AI, GenAI), especially large - language models (LLMs) in practical applications. Specifically, the authors focus on: 1. **Identifying and dealing with new attack surfaces**: With the wide application of GenAI technology, new attack surfaces and vulnerabilities keep emerging. These attacks may take advantage of the weaknesses in natural - language and multi - modal systems, threatening the security and reliability of the system. 2. **Red Teaming and Blue Teaming strategies**: - **Red Team**: Identify vulnerabilities and weaknesses in the system through active detection and simulated attacks. - **Blue Team**: Design and implement defensive measures to protect the system from adversarial attacks. 3. **Lack of practical guidance**: Although the academic community has conducted certain research on the adversarial risks of generative AI, there is still a shortage of specific guidance and tools for actual operators. Therefore, the authors hope to help practitioners better assess and mitigate these challenges by providing practical perspectives and methods. 4. **Attack Atlas framework**: Introduce an intuitive and organized classification system for analyzing single - round input attack vectors. This framework aims to provide practitioners with a clear guide to help them understand and deal with various types of attacks. ### Main contributions - **Red Team and Blue Team strategies from a practical perspective**: Compared with traditional adversarial machine learning and responsible AI methods, it provides practical insights into Red Team and Blue Team strategies for generative AI. - **List of open problems and challenges**: Lists the key problems and challenges that urgently need to be solved in the field of generative AI security, especially in defense development, evaluation methods, and Red/Blue Team technology benchmarking. - **Attack Atlas framework**: Proposes a new classification system that covers a wide range of single - round input attack vectors, helping practitioners analyze and deal with potential threats more systematically. ### Summary By combining theory and practice, this paper is committed to bridging the gap between academic research and practical applications, providing practical guidance and tools for the security of generative AI systems. This not only helps improve the robustness and security of the system but also lays the foundation for future security research and practice.

Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI

Red-Teaming for Generative AI: Silver Bullet or Security Theater?

Automated Red Teaming with GOAT: the Generative Offensive Agent Tester

Against The Achilles' Heel: A Survey on Red Teaming for Generative Models

Identifying and Mitigating the Security Risks of Generative AI

Explore, Establish, Exploit: Red Teaming Language Models from Scratch

The Attack Generator: A Systematic Approach Towards Constructing Adversarial Attacks

Automated Progressive Red Teaming

Attack Prompt Generation for Red Teaming and Defending Large Language Models

Adversarial Attacks and Defenses in Large Language Models: Old and New Threats

Adversarial Attacks on Large Language Model-Based System and Mitigating Strategies: A Case Study on ChatGPT

Is Generative AI the Next Tactical Cyber Weapon For Threat Actors? Unforeseen Implications of AI Generated Cyber Attacks

AART: AI-Assisted Red-Teaming with Diverse Data Generation for New LLM-powered Applications

Recent advancements in LLM Red-Teaming: Techniques, Defenses, and Ethical Considerations

Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming

Learning diverse attacks on large language models for robust red-teaming and safety tuning

ART: Automatic Red-teaming for Text-to-Image Models to Protect Benign Users

Generative Adversarial Networks: A Survey on Attack and Defense Perspective

Baseline Defenses for Adversarial Attacks Against Aligned Language Models

Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)

Fundamentals of Generative Large Language Models and Perspectives in Cyber-Defense