The Importance of Human-Labeled Data in the Era of LLMs

Yang Liu
DOI: https://doi.org/10.48550/arXiv.2306.14910
2023-06-18
Computation and Language
Abstract:The advent of large language models (LLMs) has brought about a revolution in the development of tailored machine learning models and sparked debates on redefining data requirements. The automation facilitated by the training and implementation of LLMs has led to discussions and aspirations that human-level labeling interventions may no longer hold the same level of importance as in the era of supervised learning. This paper presents compelling arguments supporting the ongoing relevance of human-labeled data in the era of LLMs.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is whether human - annotated data still holds significance in the era of large - language models (LLMs). With the development of large - language models, these models are mainly pre - trained on unstructured and unsupervised Internet data, which has sparked a discussion about whether the need for human annotation can be reduced or completely avoided. The paper explores that although LLMs perform well on certain tasks, such as text classification and multi - modal input processing, they still run the risks of generating wrong answers, creating hallucinatory content, and potentially spreading harmful information. Therefore, the author proposes several key points to support the importance of human - annotated data in the LLM era: 1. **Quality Control**: Even the most advanced LLMs may perform worse than well - trained human annotators on certain tasks. For example, in the text classification task, GPT - 4 has an accuracy rate of 93%, while a well - trained human annotator has an accuracy rate of 95.3%. 2. **Safety and Compliance**: LLMs may generate dangerous, violent, or unethical content when generating content. To ensure the safety and reliability of the model, human - annotated data is required for model alignment, especially by fine - tuning the model through Reinforcement Learning from Human Feedback (RLHF) technology to generate more helpful, harmless, and truthful content. 3. **Risk Control**: To achieve strict control of model risks, it is very important to provide fine - grained labels for different types of alignment. Different geopolitical regions may have different policies regarding the acceptable level of violence in the observed content; different religious regions may have different preferences for the generated answers. Therefore, it is very necessary to perform meticulous alignment according to the specific requirements of different regions. 4. **Evaluation and Trust Building**: The safe deployment of LLM depends on comprehensive evaluation. Multifaceted evaluation not only helps to identify potential safety problems and ensure low - risk deployment but also serves as a means of winning user trust. 5. **Human - Machine Collaboration**: The author also envisions a hybrid system in which LLMs and human decision - makers can co - evolve. In such a system, the model can choose "I don't know" when it is uncertain and leave the decision to humans. Meanwhile, human decision - making data can be fed back into the system to improve the calibration of model output. In conclusion, the main purpose of the paper is to emphasize that human - annotated data still has irreplaceable value in the context of the increasing popularity of large - language models and to propose a series of measures and suggestions to ensure the safety and effectiveness of these models.