Response Tuning: Aligning Large Language Models without Instruction

Seokhyun An,Hyounghun Kim
2024-10-03
Abstract:Instruction tuning-supervised fine-tuning using instruction-response pairs-is a foundational step in transitioning pre-trained Large Language Models (LLMs) into helpful and safe chat assistants. Our hypothesis is that establishing an adequate output space can enable such a transition given the capabilities inherent in pre-trained LLMs. To verify this, we propose Response Tuning (RT), which eliminates the instruction-conditioning step in instruction tuning and solely focuses on response space supervision. Our experiments demonstrate that RT models, trained only using responses, can effectively respond to a wide range of instructions and exhibit helpfulness comparable to that of their instruction-tuned counterparts. Furthermore, we observe that controlling the training response distribution can significantly improve their user preference or elicit target behaviors such as refusing assistance for unsafe queries. Our findings illuminate the role of establishing an adequate output space in alignment, highlighting the potential of the extensive inherent capabilities of pre-trained LLMs.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to make pre - trained large - scale language models (LLMs) become useful and safe chat assistants without instruction tuning by establishing an appropriate output space (i.e., response space). Specifically, the paper proposes a method named "Response Tuning" (RT). This method omits the instruction - conditioning step in the traditional instruction - response pair tuning process and only focuses on the supervision of the response space. The author assumes that pre - trained LLMs already have the capabilities such as following instructions and evaluating safety. By appropriately setting the response space, these capabilities can be stimulated, enabling the model to respond effectively to various instructions like an instruction - tuned model and show similar helpfulness. The main contributions of the paper include: 1. Proposing the Response Tuning (RT) method and verifying that pre - trained LLMs can generate responses consistent with human needs only by establishing an appropriate output space. 2. Through extensive experimental evaluations, demonstrating the effectiveness of the RT model in handling a wide range of instructions, indicating that most of the instruction - following capabilities may have been learned during the pre - training stage. 3. Proving that by controlling the training response distribution, the user preference and safety of the model can be further improved. For example, by refining response attributes or adding a small number of safe - rejection examples, the performance of the model can be significantly improved. 4. Emphasizing the importance of the inherent capabilities of pre - trained LLMs and the role of establishing an appropriate response space during the adjustment process. Overall, this paper explores how, in the absence of instruction - response pairs, large - scale language models can better adapt to human needs through the supervision of the response space while maintaining their safety and usefulness.