Abstract:The recent advancement of large language models (LLMs) has been achieved through a combo of instruction tuning and human alignment. However, building manually crafted instruction datasets and performing human alignment become the bottleneck for scaling the development of LLMs. In this paper, we exploit the idea of leveraging AI models in lieu of humans as the teacher to train student LLMs. Our method is inspired by how human students refine their writing skills by following the rubrics and learning from the revisions offered by their tutors. Specifically, we employ a teacher LLM to create a curriculum for instruction tuning of the student LLM, namely Curriculum Instruction TunING (CITING). It encompasses two main steps: (1) the teacher LLM crafts the rubrics for evaluating the answers corresponding to various types of questions, and (2) the student LLM learns to follow the rubrics and perform self-correction from the revision made by the teacher. We further iteratively carry out it to embody the procedure of CITING. We compare CITING to a series of state-of-the-art baselines on four datasets. Our method demonstrates strong improvement in terms of articulate, in-depth, and comprehensive by GPT-4 evaluation. Specifically, it achieves an average winning rate of 79.4% over SFT, 73.4% over RLHF, 78.1% over RRHF, and 76.3% over RAFT, respectively.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the bottlenecks existing in the instruction - tuning and human - alignment processes of large - scale language models (LLMs). Specifically, constructing high - quality hand - made instruction datasets and conducting human - alignment are costly and time - consuming, which has become the main obstacle to the development of LLMs. To solve these problems, the paper proposes a new method - Curriculum Instruction TunING (CITING), which uses advanced teacher LLMs to generate curricula to guide the learning process of student LLMs, thereby reducing the dependence on manual annotation and improving model performance. ### Main Contributions 1. **Curriculum Design and Standard Setting**: Through teacher LLMs, evaluation criteria are formulated for different types of questions. These criteria are not only used to evaluate the quality of students' answers but also provide additional guidance to help student LLMs correct wrong answers. 2. **Learning and Revision**: Based on the initial responses of student LLMs, teacher LLMs provide personalized revision suggestions. By comparing the answers before and after revision, student LLMs can improve their responses through self - reflection. This process can be iterated to further enhance the performance of student LLMs. ### Experimental Results The paper conducted experiments on four datasets, namely Alpaca, World Knowledge, Reading Comprehension, and Commonsense Reasoning. The experimental results show that CITING significantly outperforms existing baseline methods on all metrics, especially in zero - sample tasks. Specifically: - **Articulate (Clarity)**: Evaluate the structure, language quality, and overall readability of responses. - **In - depth (Depth)**: Evaluate the depth and details of coverage of the topic or question. - **Comprehensive (Comprehensiveness)**: Evaluate the breadth of responses, covering multiple angles of relevant aspects. ### Conclusion CITING effectively reduces the dependence on manual annotation and significantly improves the performance of student LLMs by using teacher LLMs to generate curricula and revision suggestions. This method performs well on multiple datasets, especially in common - sense reasoning tasks, showing strong generalization and reasoning abilities.

CITING: Large Language Models Create Curriculum for Instruction Tuning

Distilling Instruction-following Abilities of Large Language Models with Task-aware Curriculum Planning

Instruction Tuning for Large Language Models: A Survey

Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching

Teaching Language Models to Self-Improve by Learning from Language Feedback

Dynamics of Instruction Tuning: Each Ability of Large Language Models Has Its Own Growth Pace

Pedagogical Alignment of Large Language Models

Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models

Raw Text is All you Need: Knowledge-intensive Multi-turn Instruction Tuning for Large Language Model

LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement

CITB: A Benchmark for Continual Instruction Tuning

Evaluating Large Language Models at Evaluating Instruction Following

SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning

LinguaLIFT: An Effective Two-stage Instruction Tuning Framework for Low-Resource Language Tasks

From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning

BayLing: Bridging Cross-lingual Alignment and Instruction Following through Interactive Translation for Large Language Models

Multi-Task Instruction Tuning of LLaMa for Specific Scenarios: A Preliminary Study on Writing Assistance

Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models

INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models

Training Language Models to Generate Text with Citations via Fine-grained Rewards