Abstract:Large Language Model (LLMs) such as ChatGPT that exhibit generative AI capabilities are facing accelerated adoption and innovation. The increased presence of Generative AI (GAI) inevitably raises concerns about the risks and safety associated with these models. This article provides an up-to-date survey of recent trends in AI safety research of GAI-LLMs from a computer scientist's perspective: specific and technical. In this survey, we explore the background and motivation for the identified harms and risks in the context of LLMs being generative language models; our survey differentiates by emphasising the need for unified theories of the distinct safety challenges in the research development and applications of LLMs. We start our discussion with a concise introduction to the workings of LLMs, supported by relevant literature. Then we discuss earlier research that has pointed out the fundamental constraints of generative models, or lack of understanding thereof (e.g., performance and safety trade-offs as LLMs scale in number of parameters). We provide a sufficient coverage of LLM alignment -- delving into various approaches, contending methods and present challenges associated with aligning LLMs with human preferences. By highlighting the gaps in the literature and possible implementation oversights, our aim is to create a comprehensive analysis that provides insights for addressing AI safety in LLMs and encourages the development of aligned and secure models. We conclude our survey by discussing future directions of LLMs for AI safety, offering insights into ongoing research in this critical area.

What problem does this paper attempt to address?

The paper primarily explores the challenges and research progress of large language models (LLMs) in the field of AI safety. The authors, from the perspective of computer scientists, provide a detailed review of the safety of generative AI large language models, with a particular emphasis on technical and specific content. The core contributions of the paper are: 1. **Systematically investigating safety issues in LLMs**: Discussing them through a novel component-based framework, covering safety issues in training data, model training, prompting, alignment, and scalability. 2. **Associating identified risks with specific LLM methodologies**: Particularly in-context learning, prompting techniques, and reinforcement learning, to more precisely understand the technical roots of safety issues. 3. **Comprehensively analyzing prompting techniques and alignment technologies in LLMs**: This helps bridge the gap between theoretical safety concerns and practical evaluation methods. 4. **Positioning the discussion of model alignment**: Placing it within the broader AI safety literature, exploring different philosophical perspectives, and how language models can safely interact with other AI agents. 5. **Adopting a reductionist approach**: Combining various viewpoints from current literature to propose a unique and organized framework for accurately identifying and addressing safety issues in LLMs. In this way, the paper not only integrates existing knowledge about LLM safety but also provides researchers and practitioners with a structured approach to identify and address safety issues in different applications and domains, especially regarding the expansion of parameter scales and the emerging capabilities that come with it. In summary, this paper delves into the various safety challenges faced by large language models and proposes specific solutions and technical pathways, aiming to promote further development and improvement in this field.

AI Safety in Generative AI Large Language Models: A Survey

Social Risks in the Era of Generative AI

The global landscape of academic guidelines for generative AI and Large Language Models

Grounding and Evaluation for Large Language Models: Practical Challenges and Lessons Learned (Survey)

Safety Assessment of Chinese Large Language Models

Adopting Generative AI with Precaution in Dentistry: A Review and Reflection

Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey

Ensuring Safety and Trust: Analyzing the Risks of Large Language Models in Medicine

A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly

Current state of LLM Risks and AI Guardrails

Safeguarding Large Language Models: A Survey

ChatGPT Alternative Solutions: Large Language Models Survey

A Survey on Responsible Generative AI: What to Generate and What Not

ChatGPT in the Age of Generative AI and Large Language Models: A Concise Survey

Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems

A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation

Generative AI in EU Law: Liability, Privacy, Intellectual Property, and Cybersecurity

Amplifying Limitations, Harms and Risks of Large Language Models

Cutting Through the Confusion and Hype: Understanding the True Potential of Generative AI

Large Language Models in Law: A Survey