Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement

An Yang,Beichen Zhang,Binyuan Hui,Bofei Gao,Bowen Yu,Chengpeng Li,Dayiheng Liu,Jianhong Tu,Jingren Zhou,Junyang Lin,Keming Lu,Mingfeng Xue,Runji Lin,Tianyu Liu,Xingzhang Ren,Zhenru Zhang
2024-09-19
Abstract:In this report, we present a series of math-specific large language models: Qwen2.5-Math and Qwen2.5-Math-Instruct-1.5B/7B/72B. The core innovation of the Qwen2.5 series lies in integrating the philosophy of self-improvement throughout the entire pipeline, from pre-training and post-training to inference: (1) During the pre-training phase, Qwen2-Math-Instruct is utilized to generate large-scale, high-quality mathematical data. (2) In the post-training phase, we develop a reward model (RM) by conducting massive sampling from Qwen2-Math-Instruct. This RM is then applied to the iterative evolution of data in supervised fine-tuning (SFT). With a stronger SFT model, it's possible to iteratively train and update the RM, which in turn guides the next round of SFT data iteration. On the final SFT model, we employ the ultimate RM for reinforcement learning, resulting in the Qwen2.5-Math-Instruct. (3) Furthermore, during the inference stage, the RM is used to guide sampling, optimizing the model's performance. Qwen2.5-Math-Instruct supports both Chinese and English, and possess advanced mathematical reasoning capabilities, including Chain-of-Thought (CoT) and Tool-Integrated Reasoning (TIR). We evaluate our models on 10 mathematics datasets in both English and Chinese, such as GSM8K, MATH, GaoKao, AMC23, and AIME24, covering a range of difficulties from grade school level to math competition problems.
Computation and Language,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the capabilities of large - language models in mathematical reasoning and problem - solving, especially their performance when dealing with complex mathematical problems. Specifically, the research team has developed a series of large - language models specifically for mathematics (the Qwen2.5 - Math series) and introduced self - improvement techniques to enhance the performance of these models. ### Main problems and solutions: 1. **Improving mathematical reasoning ability**: - **Background**: Existing large - language models perform poorly in mathematical reasoning, mainly because of the lack of mathematical content in the pre - training data. - **Solution**: Enrich the pre - training data by constructing high - quality mathematical datasets (such as Qwen Math Corpus v1 and v2) to improve the model's mathematical reasoning ability. 2. **Automatically generating high - quality mathematical data**: - **Background**: Manually annotating mathematical problems and solutions is very time - consuming and costly. - **Solution**: Use the Qwen2 - Math - Instruct model to automatically generate large - scale, high - quality mathematical problems and their solutions, ensuring the quantity and quality of data while reducing the workload of manual annotation. 3. **Introduction of the reward model**: - **Background**: Relying solely on the final answer for supervised learning cannot provide sufficient feedback information, especially during complex reasoning processes. - **Solution**: Develop a reward model (Qwen2.5 - Math - RM) to evaluate the quality of the reasoning path, thereby guiding the training in the supervised fine - tuning (SFT) and reinforcement learning (RL) stages, enabling the model to better understand intermediate steps and reasoning logic. 4. **Multi - language support and tool - integrated reasoning**: - **Background**: Many existing models only support English and lack the ability to combine with external tools (such as Python interpreters) for accurate calculations. - **Solution**: The Qwen2.5 - Math series models support Chinese and English and introduce the tool - integrated reasoning (TIR) mode, allowing the model to call external tools for complex calculations during the reasoning process, further improving the accuracy of problem - solving. ### Evaluation results: - The Qwen2.5 - Math series models perform excellently on multiple mathematical benchmark tests (such as GSM8K, MATH, GaoKao, etc.), significantly outperforming leading open - source and closed - source models. - In particular, in high - difficulty competition questions such as AMC 2023, Qwen2.5 - Math - 72B - Instruct has almost solved all problems with the help of the reward model. - Even the smallest 1.5B - parameter model can achieve a score close to 80 when using a Python interpreter, surpassing many current models. In conclusion, this paper aims to significantly improve the capabilities of large - language models in mathematical reasoning and problem - solving through a series of innovative techniques and methods, enabling them to more effectively handle various complex mathematical problems.