A 22nm 54.94TFLOPS/W Transformer Fine-Tuning Processor with Exponent-Stationary Re-Computing, Aggressive Linear Fitting, and Logarithmic Domain Multiplicating

Yang Wang,Xiaolong Yang,Yubin Qin,Zhiren Zhao,Ruiqi Guo,Zhiheng Yue,Huiming Han,Shaojun Wei,Yang Hu,Shouyi Yin
DOI: https://doi.org/10.1109/vlsitechnologyandcir46783.2024.10631541
2024-01-01
Abstract:This paper proposes a Transformer-based processor supporting energy-efficient fine-tuning with batch-iteration-matrix multi-level optimizations. It has three key features: 1) An exponent-stationary re-computing scheduler (ESRC) reduces 44.2% of the storage requirement for each batch. 2) An aggressive linear fitting unit (ALFU) saves 47.4% of the computations in each iteration. 3) A logarithmic domain processing element (LDPE) decreases 36.3% of energy for matrix multiplications (MM) in fine-tuning. The proposed Transformer processor achieves an energy efficiency of 54.94TFLOPS/W. It reduces fine-tuning energy by 4.27× and offers 3.57× speedup for GPT-2.
What problem does this paper attempt to address?