Abstract:We present Llemma, a large language model for mathematics. We continue pretraining Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code, yielding Llemma. On the MATH benchmark Llemma outperforms all known open base models, as well as the unreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma is capable of tool use and formal theorem proving without any further finetuning. We openly release all artifacts, including 7 billion and 34 billion parameter models, the Proof-Pile-2, and code to replicate our experiments.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to develop a large - language model (LLEMMA) specifically for the field of mathematics in order to improve its capabilities in mathematical problem - solving, tool use, and formal theorem proving, etc. Specifically, the author hopes to continue pre - training the existing code language model (Code Llama) and train it on a specific mathematical data set (Proof - Pile - 2), so that the generated language model can perform excellently in mathematical reasoning and problem - solving. ### Main problem decomposition: 1. **Improve mathematical reasoning ability**: - By continuing pre - training on a mixed data set (Proof - Pile - 2) that contains mathematical literature, web - page data, and mathematical code, improve the model's performance in mathematical reasoning and problem - solving. - In the MATH benchmark test, the performance of LLEMMA exceeds all known open - source base models and the unpublished Minerva model suite. 2. **Tool use and formal theorem proving**: - LLEMMA can use computational tools (such as Python interpreters and formal theorem provers) to solve mathematical problems without further fine - tuning. - The model can generate formal mathematical proofs, for example, using interactive proof assistants such as Isabelle and Lean. 3. **Openness and reproducibility**: - Unlike previous mathematically - language models with closed access, LLEMMA is open - access, and the author has made all training data and code public so that other researchers can conduct further research based on it. - The 700 - million - parameter and 3.4 - billion - parameter models are made public, as well as the data sets Proof - Pile - 2 and AlgebraicStack used for training. ### Key points of the solution: - **Data set**: Proof - Pile - 2 is a mixed data set containing 55 billion tokens, covering scientific papers, web - page data containing mathematical content, and mathematical code. - **Model architecture**: LLEMMA is initialized based on Code Llama and continues training on it, training 700 - million - parameter and 3.4 - billion - parameter models respectively. - **Evaluation method**: The model is evaluated by methods such as few - shot evaluation and chain - of - thought reasoning, demonstrating its superior performance in multiple mathematical benchmark tests. Overall, this paper aims to promote the progress of mathematical reasoning and problem - solving capabilities by constructing a language model optimized specifically for the field of mathematics and provide an open platform for future research.

Llemma: An Open Language Model For Mathematics

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning

Large Language Models for Mathematicians

MathLearner: A Large Language Model Agent Framework for Learning to Solve Mathematical Problems

Code Llama: Open Foundation Models for Code

Assessing the Emergent Symbolic Reasoning Abilities of Llama Large Language Models

Benchmarking Large Language Models for Math Reasoning Tasks

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

OccamLLM: Fast and Exact Language Model Arithmetic in a Single Step

LLaMA: Open and Efficient Foundation Language Models

MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data

Towards a Mathematics Formalisation Assistant using Large Language Models

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts

MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning

MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning

Can LLMs Master Math? Investigating Large Language Models on Math Stack Exchange