Llemma: An Open Language Model For Mathematics

Zhangir Azerbayev,Hailey Schoelkopf,Keiran Paster,Marco Dos Santos,Stephen McAleer,Albert Q. Jiang,Jia Deng,Stella Biderman,Sean Welleck
2024-03-16
Abstract:We present Llemma, a large language model for mathematics. We continue pretraining Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code, yielding Llemma. On the MATH benchmark Llemma outperforms all known open base models, as well as the unreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma is capable of tool use and formal theorem proving without any further finetuning. We openly release all artifacts, including 7 billion and 34 billion parameter models, the Proof-Pile-2, and code to replicate our experiments.
Computation and Language,Artificial Intelligence,Logic in Computer Science
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to develop a large - language model (LLEMMA) specifically for the field of mathematics in order to improve its capabilities in mathematical problem - solving, tool use, and formal theorem proving, etc. Specifically, the author hopes to continue pre - training the existing code language model (Code Llama) and train it on a specific mathematical data set (Proof - Pile - 2), so that the generated language model can perform excellently in mathematical reasoning and problem - solving. ### Main problem decomposition: 1. **Improve mathematical reasoning ability**: - By continuing pre - training on a mixed data set (Proof - Pile - 2) that contains mathematical literature, web - page data, and mathematical code, improve the model's performance in mathematical reasoning and problem - solving. - In the MATH benchmark test, the performance of LLEMMA exceeds all known open - source base models and the unpublished Minerva model suite. 2. **Tool use and formal theorem proving**: - LLEMMA can use computational tools (such as Python interpreters and formal theorem provers) to solve mathematical problems without further fine - tuning. - The model can generate formal mathematical proofs, for example, using interactive proof assistants such as Isabelle and Lean. 3. **Openness and reproducibility**: - Unlike previous mathematically - language models with closed access, LLEMMA is open - access, and the author has made all training data and code public so that other researchers can conduct further research based on it. - The 700 - million - parameter and 3.4 - billion - parameter models are made public, as well as the data sets Proof - Pile - 2 and AlgebraicStack used for training. ### Key points of the solution: - **Data set**: Proof - Pile - 2 is a mixed data set containing 55 billion tokens, covering scientific papers, web - page data containing mathematical content, and mathematical code. - **Model architecture**: LLEMMA is initialized based on Code Llama and continues training on it, training 700 - million - parameter and 3.4 - billion - parameter models respectively. - **Evaluation method**: The model is evaluated by methods such as few - shot evaluation and chain - of - thought reasoning, demonstrating its superior performance in multiple mathematical benchmark tests. Overall, this paper aims to promote the progress of mathematical reasoning and problem - solving capabilities by constructing a language model optimized specifically for the field of mathematics and provide an open platform for future research.