LIMA: Less Is More for Alignment

Chunting Zhou,Pengfei Liu,Puxin Xu,Srini Iyer,Jiao Sun,Yuning Mao,Xuezhe Ma,Avia Efrat,Ping Yu,Lili Yu,Susan Zhang,Gargi Ghosh,Mike Lewis,Luke Zettlemoyer,Omer Levy
2023-05-19
Abstract:Large language models are trained in two stages: (1) unsupervised pretraining from raw text, to learn general-purpose representations, and (2) large scale instruction tuning and reinforcement learning, to better align to end tasks and user preferences. We measure the relative importance of these two stages by training LIMA, a 65B parameter LLaMa language model fine-tuned with the standard supervised loss on only 1,000 carefully curated prompts and responses, without any reinforcement learning or human preference modeling. LIMA demonstrates remarkably strong performance, learning to follow specific response formats from only a handful of examples in the training data, including complex queries that range from planning trip itineraries to speculating about alternate history. Moreover, the model tends to generalize well to unseen tasks that did not appear in the training data. In a controlled human study, responses from LIMA are either equivalent or strictly preferred to GPT-4 in 43% of cases; this statistic is as high as 58% when compared to Bard and 65% versus DaVinci003, which was trained with human feedback. Taken together, these results strongly suggest that almost all knowledge in large language models is learned during pretraining, and only limited instruction tuning data is necessary to teach models to produce high quality output.
Computation and Language,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is: **When aligning large - language models (alignment), is a large amount of instruction - tuning data and reinforcement - learning methods required?** Specifically, the authors assume that almost all knowledge and capabilities are acquired during the pre - training stage, and the alignment process mainly teaches the model how to interact with users in a specific format or style. To verify this hypothesis, they proposed an experimental model named LIMA (Less Is More for Alignment). ### Main Research Questions 1. **Importance of Pre - training**: Are the knowledge and capabilities of large - scale language models mainly acquired during the pre - training stage? 2. **Role of a Small Amount of High - Quality Data**: Is it sufficient to use only a small amount of carefully curated instruction - tuning data to make the model produce high - quality outputs? 3. **Effectiveness of Alignment Methods**: Compared with traditional instruction - tuning and reinforcement - learning methods, can simple fine - tuning achieve a similar effect? ### Research Methods - **Model Selection**: Use a pre - trained LLaMa language model with 65B parameters. - **Dataset Construction**: Carefully selected 1,000 high - quality prompt - response pairs, covering diverse tasks and styles. - **Experimental Setup**: Fine - tune the model through standard supervised loss, without using any reinforcement - learning or human - preference modeling. - **Evaluation Method**: Compare the performance of LIMA with other state - of - the - art language models (such as GPT - 4, Claude, Bard, DaVinci003, etc.) through human evaluation and using GPT - 4 as an evaluator. ### Key Findings - **Importance of Pre - training**: The results show that almost all knowledge is indeed acquired during the pre - training stage, and a small amount of high - quality instruction - tuning data is sufficient to make the model generate high - quality outputs. - **Effectiveness of Simple Fine - tuning**: LIMA outperforms or is equivalent to models trained with large - scale instruction - tuning and reinforcement - learning, such as DaVinci003, on multiple tasks. - **Generalization Ability**: LIMA can generalize well to unseen tasks and, in some cases, even outperforms GPT - 4. ### Conclusion These results strongly support the "Superficial Alignment Hypothesis", that is, the alignment process mainly teaches the model how to interact with users in a specific format or style, without requiring a large amount of instruction - tuning data and complex reinforcement - learning methods. This provides a new perspective for future research, emphasizing the importance of pre - training and the value of high - quality small - scale data. ### Formula Representation The paper does not involve specific mathematical formulas, but some of the concepts involved can be represented in the following Markdown format: - Model Parameter Quantity: \( 65 \text{B} \) - Number of Training Epochs: \( 15 \text{ epochs} \) - Learning Rate: Initial \( 1\times 10^{-5} \), linearly decays to \( 1\times 10^{-6} \) Hope this information can help you better understand the main research content and conclusions of this paper. If you have more questions, feel free to continue asking!