Lion: Adversarial Distillation of Proprietary Large Language Models

Yuxin Jiang,Chunkit Chan,Mingyang Chen,Wei Wang

2023-10-14

Abstract:The practice of transferring knowledge from a sophisticated, proprietary large language model (LLM) to a compact, open-source LLM has garnered considerable attention. Previous works have focused on a unidirectional knowledge distillation way by aligning the responses of the student model with those of the teacher model to a set of instructions. Nevertheless, they overlooked the possibility of incorporating any reciprocal "feedback"--identifying challenging instructions where the student model's performance falls short--to boost the student model's proficiency iteratively. To this end, we propose a novel adversarial distillation framework for a more efficient knowledge transfer. Leveraging the versatile role adaptability of LLMs, we prompt the teacher model to identify "hard" instructions and generate new "hard" instructions for the student model, creating a three-stage adversarial loop of imitation, discrimination, and generation. By applying this adversarial framework, we successfully transfer knowledge from ChatGPT to a student model (named Lion), using a mere 70k training data. Our results show that Lion-13B not only achieves comparable open-ended generation capabilities to ChatGPT but surpasses conventional state-of-the-art (SOTA) instruction-tuned models like Vicuna-13B by 55.4% in challenging zero-shot reasoning benchmarks such as BIG-Bench Hard (BBH) and 16.7% on AGIEval. Code and model can be found at <a class="link-external link-https" href="https://github.com/YJiangcm/Lion" rel="external noopener nofollow">this https URL</a>.

Computation and Language

What problem does this paper attempt to address?

The main goal of this paper is to propose a new adversarial distillation framework for efficiently transferring knowledge from proprietary large language models (LLMs) to a compact, open-source LLM. Specifically, the researchers aim to address the following issues: 1. **Limitations of existing knowledge distillation methods**: Existing knowledge distillation methods typically employ a unidirectional knowledge transfer approach, i.e., from the teacher model to the student model, without considering a feedback mechanism to identify areas where the student model underperforms. 2. **Enhancing the student model's capabilities**: By introducing a feedback mechanism that identifies instructions that are more challenging for the student model ("hard instructions"), new hard instructions are generated to iteratively improve the student model's capabilities. 3. **Achieving efficient model distillation**: The researchers propose an iterative framework consisting of three stages: imitation stage, discrimination stage, and generation stage, aimed at effectively extracting knowledge from complex proprietary LLMs and transferring it to a more lightweight open-source LLM. Through this method, the paper aims to create a compact open-source model that can rival or even surpass proprietary LLMs in certain tasks. Specifically, the paper demonstrates the effectiveness and efficiency of the proposed method by transferring the knowledge of ChatGPT to a student model named Lion. Experimental results show that Lion not only approaches ChatGPT's performance in open-ended generation tasks but also significantly outperforms existing state-of-the-art models in some reasoning benchmarks.

Lion: Adversarial Distillation of Proprietary Large Language Models

Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation

Using Advanced LLMs to Enhance Smaller LLMs: An Interpretable Knowledge Distillation Approach

Knowledge Distillation Using Frontier Open-source LLMs: Generalizability and the Role of Synthetic Data

LLM-Neo: Parameter Efficient Knowledge Distillation for Large Language Models

Supervised Knowledge Makes Large Language Models Better In-context Learners

AMR-Evol: Adaptive Modular Response Evolution Elicits Better Knowledge Distillation for Large Language Models in Code Generation

Beyond Answers: Transferring Reasoning Capabilities to Smaller LLMs Using Multi-Teacher Knowledge Distillation

Knowledge Distillation of Black-Box Large Language Models

MiniLLM: Knowledge Distillation of Large Language Models

Better than Your Teacher: LLM Agents that learn from Privileged AI Feedback

GKT: A Novel Guidance-Based Knowledge Transfer Framework For Efficient Cloud-edge Collaboration LLM Deployment

LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement

Can a student Large Language Model perform as well as it's teacher?

Unlock the Power: Competitive Distillation for Multi-Modal Large Language Models

Liger Kernel: Efficient Triton Kernels for LLM Training

Instruction Distillation Makes Large Language Models Efficient Zero-shot Rankers

Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models

LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge

UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition

Herd: Using multiple, smaller LLMs to match the performances of proprietary, large LLMs via an intelligent composer