TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise

Nan He,Hanyu Lai,Chenyang Zhao,Zirui Cheng,Junting Pan,Ruoyu Qin,Ruofan Lu,Rui Lu,Yunchen Zhang,Gangming Zhao,Zhaohui Hou,Zhiyuan Huang,Shaoqing Lu,Ding Liang,Mingjie Zhan
2024-07-16
Abstract:Large Language Models (LLMs) exhibit impressive reasoning and data augmentation capabilities in various NLP tasks. However, what about small models? In this work, we propose TeacherLM-7.1B, capable of annotating relevant fundamentals, chain of thought, and common mistakes for most NLP samples, which makes annotation more than just an answer, thus allowing other models to learn "why" instead of just "what". The TeacherLM-7.1B model achieved a zero-shot score of 52.3 on MMLU, surpassing most models with over 100B parameters. Even more remarkable is its data augmentation ability. Based on TeacherLM-7.1B, we augmented 58 NLP datasets and taught various student models with different parameters from OPT and BLOOM series in a multi-task setting. The experimental results indicate that the data augmentation provided by TeacherLM has brought significant benefits. We will release the TeacherLM series of models and augmented datasets as open-source.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The problem this paper attempts to address is improving data augmentation effectiveness in natural language processing (NLP) tasks, especially for small models. Specifically, the paper proposes the TeacherLM-7.1B model, which aims to annotate each sample with foundational knowledge, chains of thought, and common errors, enabling the model not only to remember the answers but also to understand the "why," thereby achieving a shift from result-oriented to process-oriented learning. Additionally, the paper demonstrates the effectiveness of TeacherLM in data augmentation for student models of different scales in a multi-task setting and proves its significant improvement in zero-shot learning capabilities.