Exploring Compute-in-Memory Architecture Granularity for Structured Pruning of Neural Networks

Fan-Hsuan Meng,Xinxin Wang,Ziyu Wang,Eric Yeu-Jer Lee,Wei D. Lu
DOI: https://doi.org/10.1109/jetcas.2022.3227471
IF: 5.877
2022-12-21
IEEE Journal on Emerging and Selected Topics in Circuits and Systems
Abstract:Compute-in-Memory (CIM) implemented with Resistive-Random-Access-Memory (RRAM) crossbars is a promising approach for Deep Neural Network (DNN) acceleration. As the DNN size continues to grow, the finite on-chip weight storage has become a challenge for CIM implementations. Pruning can reduce network size, but unstructured pruning is not compatible with CIM, while structured pruning leads to higher neural network accuracy drop. In this work we systematically evaluate how structured pruning can be efficiently implemented in CIM systems. We show that by utilizing the inherent computational granularity in CIM operations, fine-grained structured pruning can be supported with improved accuracy and minimal hardware cost. We discuss the hardware implementation in a practical system and the expected performance in terms of accuracy, energy and effective throughput. With the proposed approach, compression ratio up to 11.1 (i.e. 9% weights remaining) can be achieved with only 0.6% accuracy drop with minimal hardware overhead in the hardware design.
engineering, electrical & electronic
What problem does this paper attempt to address?