Abstract:Large Language Models (LLMs) demonstrate exceptional reasoning capabilities, often achieving state-of-the-art performance in various tasks. However, their substantial computational and memory demands, due to billions of parameters, hinder deployment in resource-constrained environments. A promising solution is knowledge distillation, where LLMs transfer reasoning capabilities to Small Language Models (SLMs, $\le$ 1B parameters), enabling wider deployment on low-resource devices. Existing methods primarily focus on generating high-quality reasoning rationales for distillation datasets but often neglect the critical role of data quantity and quality. To address these challenges, we propose a Feedback-Driven Distillation (FDD) framework to enhance SLMs' mathematical reasoning capabilities. In the initialization stage, a distillation dataset is constructed by prompting LLMs to pair mathematical problems with corresponding reasoning rationales. We classify problems into easy and hard categories based on SLM performance. For easy problems, LLMs generate more complex variations, while for hard problems, new questions of similar complexity are synthesized. In addition, we propose a multi-round distillation paradigm to iteratively enrich the distillation datasets, thereby progressively improving the mathematical reasoning abilities of SLMs. Experimental results demonstrate that our method can make SLMs achieve SOTA mathematical reasoning performance.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to enhance the mathematical reasoning ability of small language models (SLMs) in resource - constrained environments through the feedback - driven distillation (FDD) framework. Specifically, the paper focuses on how to effectively generate high - quality datasets to improve the mathematical reasoning performance of SLMs, enabling them to reach or approach the level of large language models (LLMs), while maintaining low computational and memory requirements. ### Background of the Paper - **Large Language Models (LLMs)**: Although LLMs perform well in various reasoning tasks, their large number of parameters (ranging from billions to tens of billions) results in high computational costs and memory requirements, limiting their deployment in resource - constrained environments. - **Knowledge Distillation**: A method of transferring the knowledge of LLMs to SLMs, enabling SLMs to run on low - resource devices while maintaining strong reasoning performance. - **Limitations of Existing Methods**: Most existing methods mainly focus on generating high - quality reasoning - rationale datasets, but overlook the impact of data quantity and data quality on mathematical reasoning ability. ### Solution The paper proposes a feedback - driven distillation (FDD) framework, aiming to enhance the mathematical reasoning ability of SLMs through the following steps: 1. **Initialization Phase**: Use LLMs to construct an initial mathematical distillation dataset, with each problem accompanied by a corresponding Program - of - Thought (PoT) reasoning rationale. These data are used for the initial training of SLMs. 2. **Question Generation Phase**: Based on the performance of SLMs on the initial dataset, divide the questions into two categories: simple and difficult. For simple questions, LLMs generate more complex questions; for difficult questions, LLMs generate new questions of the same difficulty. These newly generated questions are added to the distillation dataset, increasing the complexity and diversity of the dataset. 3. **Fine - Tuning Phase**: Use the expanded distillation dataset to fine - tune SLMs from scratch, further enhancing their mathematical reasoning ability. 4. **Multi - round Distillation Paradigm**: Through multiple rounds of iteration, continuously enrich the distillation dataset and gradually improve the mathematical reasoning ability of SLMs. ### Experimental Results - **Main Experimental Results**: The paper conducted experiments on multiple mathematical reasoning datasets, and the results show that the FDD framework can significantly improve the mathematical reasoning performance of SLMs, even reaching or exceeding some open - source large - scale language models (such as Llama - 2, WizardMath, etc.). - **Transferability**: The FDD framework not only improves the in - domain mathematical reasoning ability of SLMs, but also enhances their out - of - domain mathematical reasoning ability. - **Impact of Model Size**: Experimental results indicate that the model size of SLMs has a significant impact on their mathematical reasoning performance, and larger models usually perform better. ### Conclusion The FDD framework proposed in the paper effectively enhances the mathematical reasoning ability of SLMs by generating high - quality and diverse datasets, enabling them to perform high - performance mathematical reasoning tasks in resource - constrained environments. This method not only performs excellently in - domain, but also has good transferability and is suitable for multiple mathematical reasoning tasks.

Improving Mathematical Reasoning Capabilities of Small Language Models via Feedback-Driven Distillation

Key-Point-Driven Mathematical Reasoning Distillation of Large Language Model

Distilling Mathematical Reasoning Capabilities into Small Language Models

SIKeD: Self-guided Iterative Knowledge Distillation for mathematical reasoning

Mixed Distillation Helps Smaller Language Models Reason Better

Teaching Small Language Models Reasoning Through Counterfactual Distillation

Mixed Distillation Helps Smaller Language Model Better Reasoning

Democratizing Reasoning Ability: Tailored Learning from Large Language Model

Effective Distillation of Table-based Reasoning Ability from LLMs

LLM Reasoning Engine: Specialized Training for Enhanced Mathematical Reasoning

Beyond Answers: Transferring Reasoning Capabilities to Smaller LLMs Using Multi-Teacher Knowledge Distillation

Turning Dust into Gold: Distilling Complex Reasoning Capabilities from LLMs by Leveraging Negative Data

PaD: Program-aided Distillation Can Teach Small Models Reasoning Better Than Chain-of-thought Fine-tuning

Learning from Committee: Reasoning Distillation from a Mixture of Teachers with Peer-Review

Scaling Relationship on Learning Mathematical Reasoning with Large Language Models

Neural-Symbolic Collaborative Distillation: Advancing Small Language Models for Complex Reasoning Tasks

Logic Distillation: Learning from Code Function by Function for Planning and Decision-making

Distillation Contrastive Decoding: Improving LLMs Reasoning with Contrastive Decoding and Distillation

Keypoint-based Progressive Chain-of-Thought Distillation for LLMs

LLAVADI: What Matters For Multimodal Large Language Models Distillation