APIM: An Antiferromagnetic MRAM-Based Processing-In-Memory System for Efficient Bit-level Operations of Quantized Convolutional Neural Networks
Yueting Li,Jinkai Wang,Daoqian Zhu,Jinhao Li,Au Do,Xueyan Wang,Yue Zhang,Weisheng Zhao
DOI: https://doi.org/10.1109/tcad.2024.3372453
IF: 2.9
2024-01-01
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Abstract:Quantized Convolutional Neural Network (QCNN) is an attractive approach that reduces hardware overheads, especially for energy-constrained systems. However, existing QCNNs still require non-trivial hardware resources and memory capacity in order not to compromise model accuracy. To address this issue, we propose an antiferromagnetic magnetic random-access memory (ARAM)-based processing-in-memory (PIM) system, leveraging bit-level sparsity. Three optimization techniques are proposed to optimize hardware resource utilization while preserving CNN accuracy. Firstly, the ARAM-based memory subsystem allows dynamic adaptation of variable bit-width across CNN layers. Secondly, the bit-level accelerator employs the bit-fusion format engineered for processing data from the ARAM subsystem. Thirdly, a customized data path within the RISC-V core guarantees efficient instruction processing to the ARAM-based memory subsystem and bit-level accelerator, enabling optimal bit-level data transmission and computation. Experimental results demonstrate that this design remarkably reduces data movement by 50%-83% across existing CNNs. Compared to state-of-the-art designs, it enhances throughput and latency by an average of 5x and 10x, respectively. In addition, this design achieves speedups between 1.63x and 2.96x, outstripping other designs in AlexNet, VGG16, and ResNet18 benchmarks.
engineering, electrical & electronic,computer science, interdisciplinary applications, hardware & architecture