Large Language Model Alignment: A Survey

Tianhao Shen,Renren Jin,Yufei Huang,Chuang Liu,Weilong Dong,Zishan Guo,Xinwei Wu,Yan Liu,Deyi Xiong

2023-09-26

Abstract:Recent years have witnessed remarkable progress made in large language models (LLMs). Such advancements, while garnering significant attention, have concurrently elicited various concerns. The potential of these models is undeniably vast; however, they may yield texts that are imprecise, misleading, or even detrimental. Consequently, it becomes paramount to employ alignment techniques to ensure these models to exhibit behaviors consistent with human values. This survey endeavors to furnish an extensive exploration of alignment methodologies designed for LLMs, in conjunction with the extant capability research in this domain. Adopting the lens of AI alignment, we categorize the prevailing methods and emergent proposals for the alignment of LLMs into outer and inner alignment. We also probe into salient issues including the models' interpretability, and potential vulnerabilities to adversarial attacks. To assess LLM alignment, we present a wide variety of benchmarks and evaluation methodologies. After discussing the state of alignment research for LLMs, we finally cast a vision toward the future, contemplating the promising avenues of research that lie ahead. Our aspiration for this survey extends beyond merely spurring research interests in this realm. We also envision bridging the gap between the AI alignment research community and the researchers engrossed in the capability exploration of LLMs for both capable and safe LLMs.

Computation and Language,Artificial Intelligence

What problem does this paper attempt to address?

The paper primarily explores a series of ethical and social risks faced by large language models (LLMs) during their rapid development and proposes a comprehensive alignment framework to ensure that the behavior of these models is consistent with human values. Specifically: 1. **Background and Motivation**: With the development of LLMs such as ChatGPT and GPT-4, their performance on many tasks is approaching or even surpassing human levels. However, these models may also generate harmful information, leak private data, or produce misleading content, thereby posing social and ethical risks. 2. **Social and Ethical Risks of LLMs**: - **Content Generation Issues**: LLMs may generate content with biases, toxic or sensitive information, especially regarding gender, cultural, and social biases. - **Malicious Use and Negative Impact**: LLMs may be used for illegal purposes such as creating fake news, and network attack codes; additionally, large-scale deployment of LLMs may lead to changes in the labor market and environmental issues. 3. **Potential Risks of Advanced LLMs**: With technological advancements, future LLMs may exhibit characteristics such as self-awareness, deceptive behavior, self-preservation tendencies, and power-seeking, all of which could bring unforeseen risks. 4. **Concept of LLM Alignment**: To address the above challenges, the paper defines the concept of LLM alignment, which ensures that the model's goals (external and internal goals) are consistent with human values. This includes external alignment (choosing the correct loss function or reward function) and internal alignment (ensuring the model's actual training achieves the goals set by the designers). By constructing this framework, the authors hope to promote LLM research that not only enhances capabilities but also focuses on safety and reliability, ensuring that future LLMs can develop in a manner consistent with human values.

Large Language Model Alignment: A Survey

Aligning Large Language Models with Human: A Survey

Towards Scalable Automated Alignment of LLMs: A Survey

Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment

A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More

ABC Align: Large Language Model Alignment for Safety & Accuracy

Large Language Model Safety: A Holistic Survey

Towards a Unified View of Preference Learning for Large Language Models: A Survey

From Instructions to Intrinsic Human Values -- A Survey of Alignment Goals for Big Models

Deconstructing The Ethics of Large Language Models from Long-standing Issues to New-emerging Dilemmas: A Survey

Evaluating Large Language Models: A Comprehensive Survey

A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers

A Survey on Evaluation of Large Language ModelsJust Accepted

A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

A Survey of Large Language Models

A Survey on Evaluation of Large Language Models

LLM-Align: Utilizing Large Language Models for Entity Alignment in Knowledge Graphs

Survey of different Large Language Model Architectures: Trends, Benchmarks, and Challenges

Exploring the Nexus of Large Language Models and Legal Systems: A Short Survey