Reducing the Energy Dissipation of Large Language Models (llms) with Approximate Memories

Zhen Gao,Jie Deng,Pedro Reviriego,Shanshan Liu,Fabrizio Lombardi
DOI: https://doi.org/10.1109/iscas58744.2024.10558275
2024-01-01
Abstract:Large language models (LLMs) have shown impressive performance in a wide range of tasks such as answering questions or summarizing text. However, running LLMs on edge devices is challenging as they require large amounts of energy due to their memory and computation needs. In LLMs most of the memory is needed to store the model parameters which number keeps increasing from one LLM generation to the next. In the last several years, significant efforts have been made to compress and prune parameters, but this is not enough to reduce their memory needs as the number of parameters grows exponentially. In this work, to reduce energy dissipation, rather than trying to reduce the amount of memory used by LLMs, we study the use of approximate memories to store the LLM parameters. Approximate memories can significantly reduce the energy dissipation at the cost of introducing errors in some of the memory bits. Therefore, the impact of errors on LLMs must be understood. To that end, we have performed error injection on different compressed versions of a classic LLM: Bidirectional Encoder Representations from Transformers (BERT). The results show that in some cases compressed BERTs operate reliably at high bit error rates. This makes possible the use of approximate memories with a negligible impact on the LLM performance and a significant reduction in energy dissipation.
What problem does this paper attempt to address?