A 16.38TOPS and 4.55POPS/W SRAM Computing-in-Memory Macro for Signed Operands Computation and Batch Normalization Implementation

Xin Qiao,Qingyu Guo,Xiyuan Tang,Jiahao Song,Renjie Wei,Meng Li,Runsheng Wang,Yuan Wang
DOI: https://doi.org/10.1109/tcsi.2024.3353464
2024-01-01
Abstract:Edge artificial intelligence applications impose rigorous demands on local hardware to improve throughput and energy efficiency. Computing-in-memory (CIM) architectures provide high parallel and energy-efficient solutions to accelerate the multiply-and-accumulate (MAC) operations in neural networks (NNs). While SRAM-based charge-domain CIM is achieving thousands of TOPS/W energy efficiency, it encounters limitations when dealing with full NN model deployments where both activations and weights are signed. This paper proposes an SRAM-based signed batch normalization (BN) CIM macro for supporting efficient bitwise sparse MAC computation with signed operands and BN operations in deep neural networks. The key features of this macro encompass: 1) a multibit weight unit for the optimization of bitstream sparsity and the sign bit computation, 2) a 2b-serial input configuration to increase throughput and the ADC energy amortization, and 3) a quantization-hardware co-design for the BN implementation. Measurement results show that the proposed 28 nm 64 Kb CIM macro achieves 16.38 TOPS throughput and 4.55 POPS/W energy efficiency, both normalized to 1b operands. The test accuracy of CIFAR10 is 92%, based on the ResNet18 model with co-design BN implementation at signed-8b precision activations and weights.
engineering, electrical & electronic
What problem does this paper attempt to address?