A Convolution Neural Network Accelerator Design with Weight Mapping and Pipeline Optimization

Lixia Han,Peng Huang,Zheng Zhou,Yiyang Chen,Xiaoyan Liu,Jinfeng Kang
DOI: https://doi.org/10.1109/dac56929.2023.10247977
2023-01-01
Abstract:The pipeline is an efficient solution to boost performance in non-volatile memory based computing in memory (nvCIM) convolution neural network (CNN) accelerators. However, the previous works seldom focus on pipeline optimization from the perspective of the whole system, especially overlooking the effect of buffer access. In this work, we propose a high-performance NVM-based CNN accelerator with a balanced pipeline design, which takes account of both the macro computing and the buffer access. At the operator level, a matrix-based weight mapping method is proposed to reduce buffer access delay. At the macro level, decoupled access and execution design is introduced to shorten the single-layer latency. At the system level, a hybrid inter/intra-tile design is presented to balance the overall latency across CNN layers. With the collaboration among three methods, we construct a well-balanced pipeline for the nvCIM accelerator at a smaller hardware cost. Experiments show that our pipeline design can achieve 3.7х, 7.5х, and 3.5х throughput improvement for recognition of ImageNet with ResNet18, VGG19, and ResNet34 models, respectively.
What problem does this paper attempt to address?