Monolithic 3D Integration of Analog RRAM-Based Fully Weight Stationary and Novel CFET 2T0c-Based Partially Weight Stationary for Accelerating Transformer

H. Yang,Y. Li,J. Tang,R. An,Y. Zhang,L. Gao,N. Gao,H. Xu,Y. Du,Z. Liu,X. Ma,G. Wang,C. Zhao,J. Xiang,J. Zhao,W. Bu,K. Zheng,J. Kang,B. Gao,H. Qian,H. Wu
DOI: https://doi.org/10.1109/vlsitechnologyandcir46783.2024.10631548
2024-01-01
Abstract:To accelerate transformer with core computations of linear projection and matrix multiplication, we present the M3D-SFP chip architecture by Monolithic 3D integration of Si-CMOS logic, analog resistive random-access memory (RRAM)-based fully weight stationary (FWS) for linear projections with fixed weight matrix to capitalize on its highly efficient matrix-vector multiplication (MVM), and complementary FET (CFET) 2T0C-based partially weight stationary (PWS) for data buffer and attention mechanisms with dynamic matrixes to utilize its easy writing. The novel CFET 2T0C is designed to output a digital voltage, instead of current, to minimize the cost of peripheral circuits for computing-in-memory (CIM). Furthermore, the M3D-SFP achieves enhanced bandwidth across layers through the high-density interlayer vias (ILVs), which significantly boost data transfer speed. The functional integrity of the prototype M3D-SFP chip is corroborated by electrical tests on the 128Kb analog RRAM array and CFET 2T0C macro with 164 transistors, while the performance benchmark reveals a 12.9× speed-up over its 2D counterpart.
What problem does this paper attempt to address?