Abstract:Large Language Models (LLMs) require precise alignment with complex instructions to optimize their performance in real-world applications. As the demand for refined instruction tuning data increases, traditional methods that evolve simple seed instructions often struggle to effectively enhance complexity or manage difficulty scaling across various domains. Our innovative approach, Task-Centered Instruction Evolution (TaCIE), addresses these shortcomings by redefining instruction evolution from merely evolving seed instructions to a more dynamic and comprehensive combination of elements. TaCIE starts by deconstructing complex instructions into their fundamental components. It then generates and integrates new elements with the original ones, reassembling them into more sophisticated instructions that progressively increase in difficulty, diversity, and complexity. Applied across multiple domains, LLMs fine-tuned with these evolved instructions have substantially outperformed those tuned with conventional methods, marking a significant advancement in instruction-based model fine-tuning.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the need for alignment between large language models (LLMs) and complex human instructions in practical applications. Specifically, existing methods have two main problems in generating more complex instructions to improve the performance of LLMs: 1. **Insufficient Difficulty Increment Management**: Existing methods such as EVOL - INSTRUCT often have poor performance when increasing task difficulty. The hints provided are vague and lack specific guidance, making it difficult to control and predict the results of instruction evolution. For example, attempts to add a constraint often fail, or merely replace terms without truly increasing the task difficulty. 2. **Inadequate Consideration of Cross - Domain Tasks**: Existing methods fail to effectively handle the complexity of cross - domain tasks. Although some methods such as Instruction Fusion can increase the complexity of tasks by fusing two different instructions, these methods are usually limited to tasks in a single domain and lack diversity. To overcome these problems, the paper proposes the **Task - Centered Instruction Evolution (TaCIE)** method. TaCIE redefines the instruction evolution process in the following ways: - **Instruction Decomposition**: Decompose complex instructions into three basic components: background information, goals, and constraints, allowing for precise modification of each component, thereby achieving more significant instruction evolution. - **Deep Evolution**: Gradually increase the difficulty of newly generated instructions by adding new constraints or background settings, ensuring controllability of difficulty and enhancement of logical reasoning ability. - **Task Fusion**: Generate more complex and information - rich instructions by merging elements from different seed instructions, which is especially suitable for cross - domain tasks. Through these methods, TaCIE not only solves the deficiencies of existing methods in difficulty increment management and cross - domain task processing but also significantly improves the performance of LLMs in multiple benchmark tests, especially in instruction understanding, mathematics, and programming tasks.

TaCIE: Enhancing Instruction Comprehension in Large Language Models through Task-Centred Instruction Evolution

Optimizing Instruction Synthesis: Effective Exploration of Evolutionary Space with Tree Search

Explore-Instruct: Enhancing Domain-Specific Instruction Coverage through Active Exploration

CoEvol: Constructing Better Responses for Instruction Finetuning through Multi-Agent Cooperation

Chain-of-Instructions: Compositional Instruction Tuning on Large Language Models

Automatic Instruction Evolving for Large Language Models

Instruction Tuning for Large Language Models: A Survey

Evolutionary Contrastive Distillation for Language Model Alignment

Distilling Instruction-following Abilities of Large Language Models with Task-aware Curriculum Planning

InstructCoder: Instruction Tuning Large Language Models for Code Editing

CommonIT: Commonality-Aware Instruction Tuning for Large Language Models via Data Partitions

Fine-tuning Large Language Models with Sequential Instructions

InstOptima: Evolutionary Multi-objective Instruction Optimization via Large Language Model-based Instruction Operators

MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct

Demystifying Instruction Mixing for Fine-tuning Large Language Models

Instruction Fusion: Advancing Prompt Evolution through Hybridization

DolphCoder: Echo-Locating Code Large Language Models with Diverse and Multi-Objective Instruction Tuning

Dynamics of Instruction Tuning: Each Ability of Large Language Models Has Its Own Growth Pace

Benchmarking Large Language Models on Controllable Generation under Diversified Instructions

WizardLM: Empowering Large Pre-Trained Language Models to Follow Complex Instructions