Evaluating Transformer Language Models on Arithmetic Operations Using Number Decomposition

Matteo Muffo,Aldo Cocco,Enrico Bertino
2023-04-21
Abstract:In recent years, Large Language Models such as GPT-3 showed remarkable capabilities in performing NLP tasks in the zero and few shot settings. On the other hand, the experiments highlighted the difficulty of GPT-3 in carrying out tasks that require a certain degree of reasoning, such as arithmetic operations. In this paper we evaluate the ability of Transformer Language Models to perform arithmetic operations following a pipeline that, before performing computations, decomposes numbers in units, tens, and so on. We denote the models fine-tuned with this pipeline with the name Calculon and we test them in the task of performing additions, subtractions and multiplications on the same test sets of GPT-3. Results show an increase of accuracy of 63% in the five-digit addition task. Moreover, we demonstrate the importance of the decomposition pipeline introduced, since fine-tuning the same Language Model without decomposing numbers results in 0% accuracy in the five-digit addition task.
Computation and Language,Machine Learning
What problem does this paper attempt to address?
The paper primarily explores how to enhance the performance of large language models based on Transformers in executing arithmetic tasks. Specifically, the authors address the difficulties encountered by existing large language models (such as GPT-3) when handling tasks that require a certain level of reasoning ability, particularly the poor performance in addition and subtraction operations involving numbers with more than 5 digits. They propose a new method—training the model by decomposing numbers into units such as ones, tens, etc. The main contributions of the paper are as follows: 1. **Proposed the Calculon model**: This is a model based on GPT-2, fine-tuned with specially designed preprocessing steps (i.e., number decomposition) to enhance its ability to perform arithmetic operations. 2. **Validated the effectiveness of the decomposition method**: Comparative experiments show that without using number decomposition, the same GPT-2 model can hardly correctly perform addition and subtraction operations involving 4 to 5 digits; however, with the decomposition method, the model's accuracy significantly improves. 3. **Explored the effects of different methods**: In addition to number decomposition, another method called "space splitting" was also studied and compared with the baseline model. The results indicate that the number decomposition method is more effective. 4. **Impact on GPT-3**: Attempts were made to use the number decomposition method to improve GPT-3's performance in arithmetic operations, but it was found that this method did not achieve the expected results on GPT-3 and even reduced the model's performance. In summary, the paper introduces a number decomposition method to improve the performance of Transformer-based language models in arithmetic operations, particularly in handling complex or multi-digit addition and subtraction operations. This method can significantly enhance the model's accuracy. However, for more complex multiplication operations, even the number decomposition method did not achieve the desired improvement.