Hardware-oriented Algorithms for Softmax and Layer Normalization of Large Language Models

Wenjie Li,Dongxu Lyu,Gang Wang,Aokun Hu,Ningyi Xu,Guanghui He
DOI: https://doi.org/10.1007/s11432-024-4137-4
2024-01-01
Science China Information Sciences
Abstract:While large language models (LLMs) have sparked a new revolution in the field of natural language processing (NLP), their hardware accelerators have garnered tremendous attention. However, softmax and layer normalization which are the most common non-linear operations in LLMs are frequently overlooked. This paper presents hardware-oriented algorithms for both softmax and layer normalization of LLMs. We propose an approximate approach to implementing division in softmax and extend it for simultaneously computing square root and performing division in layer normalization. It replaces the original computation by multiplication and shifting. For softmax, we further approximate the exponential function by truncating its exponent and then reuse the involved subtraction. For layer normalization, we additionally simplify the computation of denominator by directly removing the term regarding the square of the mean. Furthermore, hardware architectures are developed for the proposed algorithms of softmax and layer normalization. They can work as plug-and-play units for LLM accelerators, requiring no fine-tuning and introducing negligible performance loss. Compared with the state-of-the-art designs, the proposed softmax architecture can save up to 23.45
What problem does this paper attempt to address?