DyPIM: Dynamic-Inference-Enabled Processing - In-Memory Accelerator

Tongxin Xie,Tianchen Zhao,Zhenhua Zhu,Xuefei Ning,Bing Li,Guohao Dai,Huazhong Yang,Yu Wang
DOI: https://doi.org/10.23919/date58400.2024.10546612
2024-01-01
Abstract:Dynamic neural network is an emerging research topic in deep learning. Dynamic networks selectively skip redundant computations conditioned on the input during inference (i.e., dynamic inference). And they have demonstrated superior trade-offs between accuracy and inference efficiency. However, memory I/O turns irregular and dominant because of the fine-grained computation skip in dynamic networks. Processing-In-Memory (PIM) can perform Matrix-Vector Multiplications inside the memory, eliminating the data movement of network parameters. So, it is promising to address the memory I/O challenge. However, deploying dynamic networks on PIM architectures faces severe performance degradation caused by (1) Pipeline stall when deciding on computation to be skipped. (2) Mismatch between fine-grained algorithm computation skip and coarse-grained hardware computing granularity. (3) Improper proxy of hardware performance during training. To tackle these problems, we propose DyPIM, the dynamic inference-enabled PIM accelerator with software-hardware co-optimizations. At the algorithm level, a PIM-friendly dynamic network with a standalone mask generation network and a throughput-optimal training technique is proposed. At the hardware level, a PIM architecture supporting dynamic networks is proposed, with a pipeline controller to process the dynamic dataflow. Peripheral circuits are also designed in processing units to enable non-contiguous activating of non-zero wordlines to better utilize the computation skip. Experiments show that DyPIM can achieve 1.52x to 2.74x speedup and 2.05x to 3.95x throughput improvement over the existing PIM architectures for Res Net networks.
What problem does this paper attempt to address?