Aligning Large Language Models with Human: A Survey

Yufei Wang,Wanjun Zhong,Liangyou Li,Fei Mi,Xingshan Zeng,Wenyong Huang,Lifeng Shang,Xin Jiang,Qun Liu

2023-07-25

Abstract:Large Language Models (LLMs) trained on extensive textual corpora have emerged as leading solutions for a broad array of Natural Language Processing (NLP) tasks. Despite their notable performance, these models are prone to certain limitations such as misunderstanding human instructions, generating potentially biased content, or factually incorrect (hallucinated) information. Hence, aligning LLMs with human expectations has become an active area of interest within the research community. This survey presents a comprehensive overview of these alignment technologies, including the following aspects. (1) Data collection: the methods for effectively collecting high-quality instructions for LLM alignment, including the use of NLP benchmarks, human annotations, and leveraging strong LLMs. (2) Training methodologies: a detailed review of the prevailing training methods employed for LLM alignment. Our exploration encompasses Supervised Fine-tuning, both Online and Offline human preference training, along with parameter-efficient training mechanisms. (3) Model Evaluation: the methods for evaluating the effectiveness of these human-aligned LLMs, presenting a multifaceted approach towards their assessment. In conclusion, we collate and distill our findings, shedding light on several promising future research avenues in the field. This survey, therefore, serves as a valuable resource for anyone invested in understanding and advancing the alignment of LLMs to better suit human-oriented tasks and expectations. An associated GitHub link collecting the latest papers is available at <a class="link-external link-https" href="https://github.com/GaryYufei/AlignLLMHumanSurvey" rel="external noopener nofollow">this https URL</a>.

Computation and Language

What problem does this paper attempt to address?

The paper aims to address the alignment of large language models (LLMs) with human expectations. Although LLMs perform excellently in natural language processing (NLP) tasks, they still have some limitations, such as misunderstanding human instructions, generating potentially biased content, or providing factually incorrect information. Therefore, the research community has shown a strong interest in how to make these models better understand human instructions and align with human expectations. The paper provides a comprehensive review from three main aspects: 1. **Data Collection**: Introduces methods for effectively collecting high-quality training data, including the use of NLP benchmarks, manual annotation, and leveraging advanced LLMs to generate training instructions. 2. **Training Methods**: Provides a detailed review of mainstream training methods for aligning LLMs, covering supervised fine-tuning, online and offline human preference training, and parameter-efficient training mechanisms. 3. **Model Evaluation**: Discusses methods for evaluating the effectiveness of these aligned LLMs and proposes a multidimensional evaluation approach. Through a comprehensive analysis of existing research results, the paper points out several promising directions for future research, aiming to provide valuable resources for researchers and practitioners who wish to understand and advance the alignment of LLMs with human tasks and expectations.

Aligning Large Language Models with Human: A Survey

Large Language Model Alignment: A Survey

Towards Scalable Automated Alignment of LLMs: A Survey

A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More

Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment

Towards a Unified View of Preference Learning for Large Language Models: A Survey

A Survey on Human Preference Learning for Large Language Models

From Instructions to Intrinsic Human Values -- A Survey of Alignment Goals for Big Models

A Survey on Evaluation of Large Language ModelsJust Accepted

A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers

A Survey on Evaluation of Large Language Models

Understanding the Learning Dynamics of Alignment with Human Feedback

Human-Instruction-Free LLM Self-Alignment with Limited Samples

AlignBench: Benchmarking Chinese Alignment of Large Language Models

A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias

A Survey on Large Language Models from General Purpose to Medical Applications: Datasets, Methodologies, and Evaluations

Multilingual Large Language Models: A Systematic Survey

Evaluating Large Language Models: A Comprehensive Survey

MetaAlign: Align Large Language Models with Diverse Preferences during Inference Time

Large Language Models for Data Annotation: A Survey