Abstract:Instruction fine-tuning is crucial for today's large language models (LLMs) to learn to follow instructions and align with human preferences. Conventionally, supervised data, including the instruction and the correct response, is required for instruction fine-tuning. To obtain such data, some researchers prompted well-trained models like GPT-4 to generate instructions and correct responses. In this paper, we propose a novel approach that uses the first half of a random text from OpenWebText as the instruction and GPT-3.5-turbo or GPT-4-turbo to complete the text as the response. Despite the data being "non-instructional", we found that pre-trained LLMs fine-tuned on this data can gain instruction-following capabilities. This observation is verified by fine-tuning several well-known pre-trained LLMs (e.g., LLaMA-2-7B, LLaMA-3-8B, LLaMA-3-70B, Mistral-7B-v0.1). The "non-instructional data" also improved some models that underwent supervised fine-tuning and human preference alignment. Our LLaMA-3-70B-Instruct fine-tuned through "non-instructional data" is comparable with LLaMA-3.1-70B-Instruct on the Arena Hard leaderboard. We analyzed the "non-instructional data" and ensured it is devoid of content related to instruction fine-tuning. Our findings will inspire further investigation into how to develop instruction-following capabilities without explicit instruction-related data.

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve The paper attempts to address the following issues: 1. **Non-instructional Fine-tuning**: - **Background**: Current large language models (LLMs) typically require supervised data to achieve instruction-following capabilities. This data includes instructions and their correct responses. - **Problem**: Obtaining high-quality instruction-following data usually requires a significant amount of manual annotation work or complex generation processes. - **Objective**: Explore how to use non-instructional data to enhance the instruction-following capabilities of language models. 2. **Simplifying the Data Preparation Process**: - **Traditional Method**: The traditional process of creating instruction-following datasets is cumbersome, involving the collection of large amounts of text, classification, and formatting. - **New Method**: Simplify the data preparation process by using the first half of random text as instructions and completing the second half with GPT-3.5-turbo or GPT-4-turbo. 3. **Validating Model Performance**: - **Experimental Subjects**: Conduct fine-tuning experiments on various known pre-trained LLMs (such as LLaMA-2-7B, LLaMA-3-8B, LLaMA-3-70B, Mistral-7B-v0.1, etc.). - **Evaluation Benchmarks**: Use benchmarks like MT-Bench, Open LLM Leaderboard, and Arena Hard to evaluate and demonstrate that non-instructional data can significantly improve model performance. 4. **Exploring the Effectiveness of Non-instructional Data**: - **Analysis**: The study finds that non-instructional data can not only improve the performance of base models but also enhance the conversational abilities of instruction models. - **Conclusion**: Non-instructional data can serve as an efficient method to improve the instruction-following and conversational capabilities of language models. Through the above research, the paper demonstrates the potential of non-instructional data in enhancing the performance of language models and provides new insights for future model fine-tuning.

Non-instructional Fine-tuning: Enabling Instruction-Following Capabilities in Pre-trained Language Models without Instruction-Following Data

Instruction Tuning with GPT-4

Training language models to follow instructions with human feedback

InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4

Maybe Only 0.5 Training Data Instruction Tuning

BioInstruct: Instruction Tuning of Large Language Models for Biomedical Natural Language Processing

Instruction Following without Instruction Tuning

From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning

Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning

Balancing Continuous Pre-Training and Instruction Fine-Tuning: Optimizing Instruction-Following in LLMs

Instruction Mining: Instruction Data Selection for Tuning Large Language Models

Harnessing the Power of David against Goliath: Exploring Instruction Data Generation without Using Closed-Source Models

Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models

Towards Building the Federated GPT: Federated Instruction Tuning

Towards Robust Instruction Tuning on Multimodal Large Language Models

Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models

Demystifying Instruction Mixing for Fine-tuning Large Language Models

Phased Instruction Fine-Tuning for Large Language Models

Multi-Task Instruction Tuning of LLaMa for Specific Scenarios: A Preliminary Study on Writing Assistance

Fine-tuning Large Language Models with Sequential Instructions

SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning