WizardLM: Empowering Large Language Models to Follow Complex Instructions

Can Xu,Qingfeng Sun,Kai Zheng,Xiubo Geng,Pu Zhao,Jiazhan Feng,Chongyang Tao,Daxin Jiang

2023-06-10

Abstract:Training large language models (LLMs) with open-domain instruction following data brings colossal success. However, manually creating such instruction data is very time-consuming and labor-intensive. Moreover, humans may struggle to produce high-complexity instructions. In this paper, we show an avenue for creating large amounts of instruction data with varying levels of complexity using LLM instead of humans. Starting with an initial set of instructions, we use our proposed Evol-Instruct to rewrite them step by step into more complex instructions. Then, we mix all generated instruction data to fine-tune LLaMA. We call the resulting model WizardLM. Human evaluations on a complexity-balanced test bed and Vicuna's testset show that instructions from Evol-Instruct are superior to human-created ones. By analyzing the human evaluation results of the high complexity part, we demonstrate that outputs from our WizardLM are preferred to outputs from OpenAI ChatGPT. In GPT-4 automatic evaluation, WizardLM achieves more than 90\% capacity of ChatGPT on 17 out of 29 skills. Even though WizardLM still lags behind ChatGPT in some aspects, our findings suggest that fine-tuning with AI-evolved instructions is a promising direction for enhancing LLMs. Our code and data are public at <a class="link-external link-https" href="https://github.com/nlpxucan/WizardLM" rel="external noopener nofollow">this https URL</a>

Computation and Language,Artificial Intelligence

What problem does this paper attempt to address?

The paper aims to address the inadequacy of large language models (LLMs) in following complex instructions. Specifically, the paper proposes a new method called Evol-Instruct, which leverages LLMs to automatically generate open-domain instruction data with varying levels of complexity to improve the performance of LLMs. The main issues the paper attempts to solve are as follows: 1. **Difficulty in manually creating instruction datasets**: Manually creating high-quality, high-complexity instruction datasets is time-consuming and labor-intensive, and it is challenging to generate diverse and complex instructions. 2. **Limitations of existing datasets**: Existing closed-domain instruction datasets lack diversity and are usually tailored to a single task (such as translation or summarization), failing to meet the multi-task demands of real-life scenarios. 3. **Enhancing LLMs' ability to handle complex instructions**: By using automatically evolved instruction datasets to fine-tune LLMs, the ability to handle complex instructions is improved, especially in terms of performance on high-difficulty instructions. The paper demonstrates that the Evol-Instruct method can automatically generate instruction datasets with higher diversity and complexity, and validates the effectiveness of this method through experiments, particularly highlighting its advantages in handling high-difficulty instructions.

WizardLM: Empowering Large Language Models to Follow Complex Instructions

WizardLM: Empowering Large Pre-Trained Language Models to Follow Complex Instructions

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models

Evaluating Large Language Models at Evaluating Instruction Following

Large Language Models as Code Executors: An Exploratory Study

Magicoder: Empowering Code Generation with OSS-Instruct

On LLM Wizards: Identifying Large Language Models' Behaviors for Wizard of Oz Experiments

GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction

BayLing: Bridging Cross-lingual Alignment and Instruction Following through Interactive Translation for Large Language Models

LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement

Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models

Can Large Language Models Understand Real-World Complex Instructions?

Benchmarking Large Language Models on Controllable Generation under Diversified Instructions

EasyInstruct: An Easy-to-use Instruction Processing Framework for Large Language Models

Empowering Large Language Models for Textual Data Augmentation

LLM4DS: Evaluating Large Language Models for Data Science Code Generation

Automatic Instruction Evolving for Large Language Models

Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models

Use Your INSTINCT: INSTruction optimization for LLMs usIng Neural bandits Coupled with Transformers