PIM-HLS: An Automatic Hardware Generation Tool for Heterogeneous Processing-In-Memory-based Neural Network Accelerators.

Yu Zhu,Zhenhua Zhu,Guohao Dai,Fengbin Tu,Hanbo Sun,Kwang-Ting Cheng,Huazhong Yang,Yu Wang
DOI: https://doi.org/10.1109/DAC56929.2023.10247755
2023-01-01
Abstract:Processing-in-memory (PIM) architectures have shown great abilities for neural network (NN) acceleration on edge devices that demand low latency under severe area constraints. Heterogeneous PIM architectures with different PIM implementation approaches such as RRAM-based PIM and SRAMbased PIM can further improve the performance. However, the automatic generation of heterogeneous PIM architectures faces the following two unresolved problems. First, existing work has not considered the design for heterogeneous PIM-based NN accelerators with multiple memory technologies. Second, for PIM with insufficient memory on edge devices, it is challenging to find the optimal runtime weight scheduling strategy in an O(L!) optimization space for the NN with L layers. In this paper, we propose PIM-HLS, an automatic hardware generation tool for heterogeneous PIM-based NN accelerators. Aiming at the problems above, we first point out that heterogeneous PIM can improve the performance under severe area constraints. Then we optimize the architectures for each NN layer by taking the advantage of different memory technologies. We also define the optimization problem of runtime weight scheduling and mapping for the first time, and propose a dynamic-programming-based weight scheduling algorithm to reduce the optimization space to O(L-2). We implement PIM-HLS to automatically generate the hardware code and the instructions. Results show that we achieve an averagely 5.9x speedup with 72.8% less area compared with state-of-the-art PIM designs.
What problem does this paper attempt to address?