A 28-Nm 28.8-TOPS/W Attention-Based NN Processor with Correlative CIM Ring Architecture and Dataflow-Reshaped Digital-Assisted CIM Array
Ruiqi Guo,Zhiheng Yue,Yang Wang,Hao Li,Te Hu,Yabing Wang,Hao Sun,Jeng-Long Hsu,Yaojun Zhang,Bonan Yan,Leibo Liu,Ru Huang,Shaojun Wei,Shouyi Yin
DOI: https://doi.org/10.1109/jssc.2024.3419808
IF: 5.4
2024-01-01
IEEE Journal of Solid-State Circuits
Abstract:Transformer models have achieved impressive performance in various applications by effectively capturing contextual knowledge from the entire sequence. However, the multi-headed self-attention (MHSA) mechanism of Transformer models introduces multiple rounds of matrix multiplication (MM) and Softmax operations, which results in massive data movement and computations. Compute-in-memory (CIM) is a promising candidate to reduce data movement in the memory hierarchy of artificial intelligence (AI) accelerators, increasing the speed and energy efficiency for MM computation. However, the attention mechanism introduces dynamic MMs involving Query ( $Q$ ), Key ( $K$ ), and Value ( $V$ ). Since these matrices are both generated dynamically in previous layers, the dynamic MM mismatches the CIM paradigm, resulting in significant energy/latency consumption. This article proposes a CIM-based transformer accelerator (TranCIM) with three design features, effectively handling dynamic MMs. First, a correlative CIM ring (CRCIMR) executes the dynamic MM that involving Q and K by matrix decomposition, removing the loading of dynamically generated matrix in SRAM-based CIM (SRAM-CIM) cells. Second, a Softmax-based speculation unit (SSU) reduces the computation redundancy in dynamic MMs. Third, a digital-assisted CIM array (DACIMA) executes the dynamic MM that involving V based on symmetrical-L-shaped products, allowing the CIM macro to work in compute mode continuously. Fabricated in a 28-nm CMOS technology, the proposed accelerator occupies an area of 7.08 mm $^2$ . Measured on TinyBERT and BERT-Base with INT8 precision, the proposed accelerator achieves a system-level energy efficiency of 28.8 TOPS/W.