Sparsity-Oriented MRAM-Centric Computing for Efficient Neural Network Inference

Jia-Le Cui,Yanan Guo,Juntong Chen,Bo Liu,Hao Cai
DOI: https://doi.org/10.1109/tetc.2023.3326312
2024-01-01
IEEE Transactions on Emerging Topics in Computing
Abstract:Near-memory computing (NMC) and in-memory computing (IMC) paradigms show great importance in non-von Neumann architecture. Spin-transfer torque magnetic random access memory (STT-MRAM) is considered as a promising candidate to realize both NMC and IMC for resource-constrained applications. In this work, two MRAM-centric computing frameworks are proposed: triple-skipping NMC (TS-NMC) and analog-multi-bit-sparsity IMC (AMS-IMC). The TS-NMC exploits the sparsity of activations and weights to implement a write-read-calculation triple skipping computing scheme by utilizing a sparse flag generator. The AMS-IMC with reconfigured computing bit-cell and flag generator accommodate bit-level activation sparsity in the computing. STT-MRAM array and its peripheral circuits are implemented with an industrial 28-nm CMOS design-kit and an MTJ compact model. The triple-skipping scheme can reduce memory access energy consumption by 51.5× when processing zero vectors, compared to processing non-zero vectors. The energy efficiency of AMS-IMC is improved by 5.9× and 1.5× (with 75% input sparsity) as compared to the conventional NMC framework and existing analog IMC framework. Verification results show that TS-NMC and AMS-IMC achieved 98.6% and 97.5% inference accuracy in MNIST classification, with energy consumption of 14.2nJ/pattern and 12.7nJ/pattern, respectively.
What problem does this paper attempt to address?