Training Task Experts through Retrieval Based Distillation

Jiaxin Ge,Xueying Jia,Vijay Viswanathan,Hongyin Luo,Graham Neubig

2024-07-08

Abstract:One of the most reliable ways to create deployable models for specialized tasks is to obtain an adequate amount of high-quality task-specific data. However, for specialized tasks, often such datasets do not exist. Existing methods address this by creating such data from large language models (LLMs) and then distilling such knowledge into smaller models. However, these methods are limited by the quality of the LLMs output, and tend to generate repetitive or incorrect data. In this work, we present Retrieval Based Distillation (ReBase), a method that first retrieves data from rich online sources and then transforms them into domain-specific data. This method greatly enhances data diversity. Moreover, ReBase generates Chain-of-Thought reasoning and distills the reasoning capacity of LLMs. We test our method on 4 benchmarks and results show that our method significantly improves performance by up to 7.8% on SQuAD, 1.37% on MNLI, and 1.94% on BigBench-Hard.

Computation and Language

What problem does this paper attempt to address?

This paper proposes a solution to the problem of acquiring data for training task-specific expert models. The current method involves generating task-specific data from large language models (LLMs) and distilling the knowledge into smaller models. However, this approach is limited by the quality of the output from LLMs, often leading to repetitive or inaccurate data. The paper introduces a method called Retrieval-Based Distillation (ReBase), which first retrieves data from rich online resources and then transforms it into domain-specific data to enhance data diversity. ReBase also generates Chain-of-Thought reasoning and distills the reasoning capability of LLMs into smaller models. In four benchmark tests, ReBase achieved a 7.8% improvement on SQuAD, a 1.37% improvement on MNLI, and a 1.94% improvement on BigBench-Hard. The research found that by retrieving and transforming data from multiple sources, the performance of task-specific models can be effectively improved. ReBase overcomes the problem of insufficient information due to relying on a single or few datasets, enhancing the content diversity and quality of the data, especially for tasks that require reasoning.

Training Task Experts through Retrieval Based Distillation

Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains.

Extract then Distill: Efficient and Effective Task-Agnostic BERT Distillation

Intermediate Distillation: Data-Efficient Distillation from Black-Box LLMs for Information Retrieval

Multi-Stage Balanced Distillation: Addressing Long-Tail Challenges in Sequence-Level Knowledge Distillation

Reprogramming Distillation for Medical Foundation Models

XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation

ERNIE-Search: Bridging Cross-Encoder with Dual-Encoder Via Self On-the-fly Distillation for Dense Passage Retrieval

EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval

Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks

LEAD: Liberal Feature-based Distillation for Dense Retrieval

ERNIE-Search: Bridging Cross-Encoder with Dual-Encoder via Self On-the-fly Distillation for Dense Passage Retrieval

Distillation Matters: Empowering Sequential Recommenders to Match the Performance of Large Language Model

XtremeDistil: Multi-stage Distillation for Massive Multilingual Models

Mixed Distillation Helps Smaller Language Models Reason Better

Distilling from Similar Tasks for Transfer Learning on a Budget

Improving task-agnostic BERT distillation with layer mapping search

Selective Cross-Task Distillation

Beyond Self-Supervision: A Simple Yet Effective Network Distillation Alternative to Improve Backbones

On Good Practices for Task-Specific Distillation of Large Pretrained Visual Models

PairDistill: Pairwise Relevance Distillation for Dense Retrieval