Distilling LLMs' Decomposition Abilities into Compact Language Models

Denis Tarasov,Kumar Shridhar
2024-02-02
Abstract:Large Language Models (LLMs) have demonstrated proficiency in their reasoning abilities, yet their large size presents scalability challenges and limits any further customization. In contrast, compact models offer customized training but often fall short in solving complex reasoning tasks. This study focuses on distilling the LLMs' decomposition skills into compact models using offline reinforcement learning. We leverage the advancements in the LLM`s capabilities to provide feedback and generate a specialized task-specific dataset for training compact models. The development of an AI-generated dataset and the establishment of baselines constitute the primary contributions of our work, underscoring the potential of compact models in replicating complex problem-solving skills.
Computation and Language,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The paper aims to address the challenges of large language models (LLMs) in reasoning capabilities, particularly how to distill the complex problem-solving abilities of these large models into smaller models. Specifically, the paper focuses on the following points: 1. **The contradiction between model size and reasoning ability**: Although large language models exhibit strong reasoning capabilities, their massive size limits further customization possibilities and poses challenges in computational resources. 2. **Limitations of small models**: While small models are easier to customize and train, they perform poorly in solving complex reasoning tasks. 3. **Insufficient research on problem decomposition ability**: Despite extensive research on end-to-end reasoning methods, the ability to decompose complex problems into simple sub-problems has not been fully explored. To address these issues, the authors propose a method to distill the problem decomposition skills of large language models into compact models using offline reinforcement learning techniques. The main contributions include: - Creating an AI-generated dataset specifically for mathematical reasoning tasks (GSM8K-AI-SubQ) to train small models. - Establishing various baseline models to verify the effectiveness of different methods, demonstrating the potential of small models in replicating complex problem-solving skills. In summary, the paper attempts to enhance the performance of small models in tasks such as mathematical reasoning through offline reinforcement learning and AI-generated datasets, providing new directions for future research.