A Configurable Multi-Precision CNN Computing Framework Based on Single Bit RRAM

Zhenhua Zhu,Hanbo Sun,Yujun Lin,Guohao Dai,Lixue Xia,Song Han,Yu Wang,Huazhong Yang
DOI: https://doi.org/10.1145/3316781.3317739
2019-01-01
Abstract:Convolutional Neural Networks (CNNs) play a vital role in machine learning. Emerging resistive random-access memories (RRAMs) and RRAM-based Processing-In-Memory architectures have demonstrated great potentials in boosting both the performance and energy efficiency of CNNs. However, restricted by the immature process technology, it is hard to implement and fabricate a CNN accelerator chip based on multi-bit RRAM devices. In addition, existing single bit RRAM based CNN accelerators only focus on binary or ternary CNNs which have more than 10% accuracy loss compared with full precision CNNs. This paper proposes a con. gurable multi-precision CNN computing framework based on single bit RRAM, which consists of an RRAM computing overhead aware network quantization algorithm and a con. gurable multi-precision CNN computing architecture based on single bit RRAM. The proposed method can achieve equivalent accuracy as full precision CNN but also with lower storage consumption and latency via multiple precision quantization. The designed architecture supports for accelerating the multi-precision CNNs even with various precision among different layers. Experiment results show that the proposed framework can reduce 70% computing area and 75% computing energy on average, with nearly no accuracy loss. And the equivalent energy efficiency is 1.6 similar to 8.6x compared with existing RRAM based architectures with only 1.07% area overhead.
What problem does this paper attempt to address?