AI Safety in Generative AI Large Language Models: A Survey

Jaymari Chua,Yun Li,Shiyi Yang,Chen Wang,Lina Yao
2024-07-06
Abstract:Large Language Model (LLMs) such as ChatGPT that exhibit generative AI capabilities are facing accelerated adoption and innovation. The increased presence of Generative AI (GAI) inevitably raises concerns about the risks and safety associated with these models. This article provides an up-to-date survey of recent trends in AI safety research of GAI-LLMs from a computer scientist's perspective: specific and technical. In this survey, we explore the background and motivation for the identified harms and risks in the context of LLMs being generative language models; our survey differentiates by emphasising the need for unified theories of the distinct safety challenges in the research development and applications of LLMs. We start our discussion with a concise introduction to the workings of LLMs, supported by relevant literature. Then we discuss earlier research that has pointed out the fundamental constraints of generative models, or lack of understanding thereof (e.g., performance and safety trade-offs as LLMs scale in number of parameters). We provide a sufficient coverage of LLM alignment -- delving into various approaches, contending methods and present challenges associated with aligning LLMs with human preferences. By highlighting the gaps in the literature and possible implementation oversights, our aim is to create a comprehensive analysis that provides insights for addressing AI safety in LLMs and encourages the development of aligned and secure models. We conclude our survey by discussing future directions of LLMs for AI safety, offering insights into ongoing research in this critical area.
Computers and Society,Computation and Language
What problem does this paper attempt to address?
The paper primarily explores the challenges and research progress of large language models (LLMs) in the field of AI safety. The authors, from the perspective of computer scientists, provide a detailed review of the safety of generative AI large language models, with a particular emphasis on technical and specific content. The core contributions of the paper are: 1. **Systematically investigating safety issues in LLMs**: Discussing them through a novel component-based framework, covering safety issues in training data, model training, prompting, alignment, and scalability. 2. **Associating identified risks with specific LLM methodologies**: Particularly in-context learning, prompting techniques, and reinforcement learning, to more precisely understand the technical roots of safety issues. 3. **Comprehensively analyzing prompting techniques and alignment technologies in LLMs**: This helps bridge the gap between theoretical safety concerns and practical evaluation methods. 4. **Positioning the discussion of model alignment**: Placing it within the broader AI safety literature, exploring different philosophical perspectives, and how language models can safely interact with other AI agents. 5. **Adopting a reductionist approach**: Combining various viewpoints from current literature to propose a unique and organized framework for accurately identifying and addressing safety issues in LLMs. In this way, the paper not only integrates existing knowledge about LLM safety but also provides researchers and practitioners with a structured approach to identify and address safety issues in different applications and domains, especially regarding the expansion of parameter scales and the emerging capabilities that come with it. In summary, this paper delves into the various safety challenges faced by large language models and proposes specific solutions and technical pathways, aiming to promote further development and improvement in this field.