Aligning Large Language Models with Human: A Survey

Yufei Wang,Wanjun Zhong,Liangyou Li,Fei Mi,Xingshan Zeng,Wenyong Huang,Lifeng Shang,Xin Jiang,Qun Liu
2023-07-25
Abstract:Large Language Models (LLMs) trained on extensive textual corpora have emerged as leading solutions for a broad array of Natural Language Processing (NLP) tasks. Despite their notable performance, these models are prone to certain limitations such as misunderstanding human instructions, generating potentially biased content, or factually incorrect (hallucinated) information. Hence, aligning LLMs with human expectations has become an active area of interest within the research community. This survey presents a comprehensive overview of these alignment technologies, including the following aspects. (1) Data collection: the methods for effectively collecting high-quality instructions for LLM alignment, including the use of NLP benchmarks, human annotations, and leveraging strong LLMs. (2) Training methodologies: a detailed review of the prevailing training methods employed for LLM alignment. Our exploration encompasses Supervised Fine-tuning, both Online and Offline human preference training, along with parameter-efficient training mechanisms. (3) Model Evaluation: the methods for evaluating the effectiveness of these human-aligned LLMs, presenting a multifaceted approach towards their assessment. In conclusion, we collate and distill our findings, shedding light on several promising future research avenues in the field. This survey, therefore, serves as a valuable resource for anyone invested in understanding and advancing the alignment of LLMs to better suit human-oriented tasks and expectations. An associated GitHub link collecting the latest papers is available at <a class="link-external link-https" href="https://github.com/GaryYufei/AlignLLMHumanSurvey" rel="external noopener nofollow">this https URL</a>.
Computation and Language
What problem does this paper attempt to address?
The paper aims to address the alignment of large language models (LLMs) with human expectations. Although LLMs perform excellently in natural language processing (NLP) tasks, they still have some limitations, such as misunderstanding human instructions, generating potentially biased content, or providing factually incorrect information. Therefore, the research community has shown a strong interest in how to make these models better understand human instructions and align with human expectations. The paper provides a comprehensive review from three main aspects: 1. **Data Collection**: Introduces methods for effectively collecting high-quality training data, including the use of NLP benchmarks, manual annotation, and leveraging advanced LLMs to generate training instructions. 2. **Training Methods**: Provides a detailed review of mainstream training methods for aligning LLMs, covering supervised fine-tuning, online and offline human preference training, and parameter-efficient training mechanisms. 3. **Model Evaluation**: Discusses methods for evaluating the effectiveness of these aligned LLMs and proposes a multidimensional evaluation approach. Through a comprehensive analysis of existing research results, the paper points out several promising directions for future research, aiming to provide valuable resources for researchers and practitioners who wish to understand and advance the alignment of LLMs with human tasks and expectations.