Abstract:In recent years, Large Language Models such as GPT-3 showed remarkable capabilities in performing NLP tasks in the zero and few shot settings. On the other hand, the experiments highlighted the difficulty of GPT-3 in carrying out tasks that require a certain degree of reasoning, such as arithmetic operations. In this paper we evaluate the ability of Transformer Language Models to perform arithmetic operations following a pipeline that, before performing computations, decomposes numbers in units, tens, and so on. We denote the models fine-tuned with this pipeline with the name Calculon and we test them in the task of performing additions, subtractions and multiplications on the same test sets of GPT-3. Results show an increase of accuracy of 63% in the five-digit addition task. Moreover, we demonstrate the importance of the decomposition pipeline introduced, since fine-tuning the same Language Model without decomposing numbers results in 0% accuracy in the five-digit addition task.

What problem does this paper attempt to address?

The paper primarily explores how to enhance the performance of large language models based on Transformers in executing arithmetic tasks. Specifically, the authors address the difficulties encountered by existing large language models (such as GPT-3) when handling tasks that require a certain level of reasoning ability, particularly the poor performance in addition and subtraction operations involving numbers with more than 5 digits. They propose a new method—training the model by decomposing numbers into units such as ones, tens, etc. The main contributions of the paper are as follows: 1. **Proposed the Calculon model**: This is a model based on GPT-2, fine-tuned with specially designed preprocessing steps (i.e., number decomposition) to enhance its ability to perform arithmetic operations. 2. **Validated the effectiveness of the decomposition method**: Comparative experiments show that without using number decomposition, the same GPT-2 model can hardly correctly perform addition and subtraction operations involving 4 to 5 digits; however, with the decomposition method, the model's accuracy significantly improves. 3. **Explored the effects of different methods**: In addition to number decomposition, another method called "space splitting" was also studied and compared with the baseline model. The results indicate that the number decomposition method is more effective. 4. **Impact on GPT-3**: Attempts were made to use the number decomposition method to improve GPT-3's performance in arithmetic operations, but it was found that this method did not achieve the expected results on GPT-3 and even reduced the model's performance. In summary, the paper introduces a number decomposition method to improve the performance of Transformer-based language models in arithmetic operations, particularly in handling complex or multi-digit addition and subtraction operations. This method can significantly enhance the model's accuracy. However, for more complex multiplication operations, even the number decomposition method did not achieve the desired improvement.

Evaluating Transformer Language Models on Arithmetic Operations Using Number Decomposition

How well do Large Language Models perform in Arithmetic tasks?

Investigating the Limitations of Transformers with Simple Arithmetic Tasks

Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines

Dissecting Multiplication in Transformers: Insights into LLMs

Arithmetic with language models: From memorization to computation

Transformers Can Do Arithmetic with the Right Embeddings

Scaling Behavior for Large Language Models regarding Numeral Systems: An Example using Pythia

How Numerical Precision Affects Mathematical Reasoning Capabilities of LLMs

Teaching Arithmetic to Small Transformers

Probing for Multilingual Numerical Understanding in Transformer-Based Language Models

Uncovering the Interpretation of Large Language Models

Positional Description Matters for Transformers Arithmetic

Attending to Mathematical Language with Transformers

Language Models Do Hard Arithmetic Tasks Easily and Hardly Do Easy Arithmetic Tasks

Interpreting and Improving Large Language Models in Arithmetic Calculation

Can Language Models Rival Mathematics Students? Evaluating Mathematical Reasoning through Textual Manipulation and Human Experiments

GPT Can Solve Mathematical Problems Without a Calculator

MetaRuleGPT: Recursive Numerical Reasoning of Language Models Trained with Simple Rules

Transformers discover an elementary calculation system exploiting local attention and grid-like problem representation