TEA-S: A Tiny and Efficient Architecture for PLAC-Based Softmax in Transformers

Zhengyu Mei,Hongxi Dong,Yuxuan Wang,Hongbing Pan
DOI: https://doi.org/10.1109/tcsii.2023.3265710
2023-01-01
IEEE Transactions on Circuits & Systems II Express Briefs
Abstract:With the popularity of Transformer neural networks, it is inevitable for hardware accelerators to perform nonlinear computation mainly based on the softmax operation. However, a better compromise between the algorithm performance and hardware overhead is always a constant challenge. Hence, this brief advances a tiny and efficient architecture named TEA-S to implement the softmax function with the universal approximation scheme based on Piecewise Linear Approximation Computation (PLAC). With the first co-optimization of calculation and memory, TEA-S can better achieve the design goals of the tiny area and high efficiency. The implementation results show that the peak efficiency of processing 8-bit quantized data will be 487.51 Gps/(mm $^{{2}}{\cdot }$ mW) with the tiny area of $3052.43~{\mu }{\mathrm {m}}^{2}$ at the frequency of 0.5 GHz under 90-nm CMOS technology. Moreover, TEA-S can offer the universal solution to any lengths of input sequences, providing negligible accuracy loss in Transformers compared to the quantized baselines.
What problem does this paper attempt to address?