HiT-CIM: A High-Throughput Compute-In-Memory SRAM Architecture with Simultaneous Weight Loading/Computing and Balance Capabilities
Junzhan Liu,Sifan Sun,Liang Zhang,Lichuan Luo,Liang Ran,He Zhang,Wang Kang,Weisheng Zhao
DOI: https://doi.org/10.1109/tetc.2024.3471176
2024-01-01
IEEE Transactions on Emerging Topics in Computing
Abstract:In the post-Moore's era, compute-in-memory (CIM) techniques are promising to break the memory wall. In particular, SRAM-based CIMs (SRAM-CIMs) have attracted widespread attention owing to its good scalability with advanced process. At present, a rich variety of works focus on energy-efficiency improvement by either designing different bit-cell structures or optimizing circuit/chip architectures. However, owing to the CIM's primitive property to store one of the operands in the memory bit-cells, substantial computing resource is wasted by suspension during the operands loading procedure. In this paper, a high-throughput SRAM-CIM (HiT-CIM) architecture with simultaneous weight loading and computing capabilities is proposed by integrating on-chip nonvolatile MRAM (magnetic random-access memory). Meanwhile, both the mainstream current-domain and charge-domain SRAM bit-cell structures are optimized to support such an architecture. Furthermore, a reconfigurable fully-pipelined MRAM is designed to provide fast data loading in HiT-CIM, which can finetune weight loading strategy rapidly for different neural network models. Afterwards, an optimal evaluation and configuration strategy is proposed to improve the macro-level performance by considering the key components and parameters in terms of SRAM array, ADC, MRAM structure and frequency. Finally, the HiT-CIM's feasibility is verified under a 40-nm foundry's process. The results show that a multiple-fold speed improvement can be obtained on VGG19, ResNet18 and MobileNetV1, respectively. In specific, the area efficiency of HiT-CIM on VGG19 achieves 1124 GOPS/mm2 and 1880.12 GOPS/mm2 for the current-domain and chargedomain SRAM-CIMs, respectively. Up to 5.3× improvement is realized compared with prior works