Minimizing Communication Conflicts in Network-On-Chip Based Processing-In-Memory Architecture

Hanbo Sun,Tongxin Xie,Zhenhua Zhu,Guohao Dai,Huazhong Yang,Yu Wang
DOI: https://doi.org/10.23919/DATE56975.2023.10137203
2023-01-01
Abstract:Deep Neural Networks (DNNs) have made significant breakthroughs in various fields. However, their enormous computations and parameters seriously hinder their applications. Emerging Processing-In-Memory (PIM) architectures provide extremely high energy efficiency to accelerate DNN computing. Moreover, Network-on-Chip (NoC) based PIM architectures significantly improve the scalability of PIM architectures. However, the contradiction between high communication and limited NoC bandwidth introduces severe communication conflicts. Existing work neglects the impact of communication conflicts. On the one hand, neglecting communication conflicts leads to the lack of precise performance estimations in the mapping process, making it hard to find optimal results. On the other hand, communication conflicts cause low NoC bandwidth utilization in the schedule process. And there is over 70% latency gap in existing work caused by communication conflicts. This paper proposes communication conflict optimized mapping and schedule strategies for NoC-based PIM architectures. The proposed mapping strategy constructs communication conflict graphs to model communication conflicts. Based on this constructed graph, we adopt a Graph Neural Network (GNN) as a precise performance estimator. Our schedule strategy predefines the communication priority and NoC communication behavior tables for target DNN workloads. In this way, it can improve the NoC bandwidth utilization effectively. Compared with existing work, for typical classification DNNs on the CIFAR and ImageNet datasets, the proposed strategies reduce 78% latency and improve the throughput by 3.33x on average with negligible deployment and hardware overhead. Experimental results also show that our strategies decrease the average gap to ideal cases without communication conflicts from 80.7% and 70% to 12.3% and 1.26% for latency and throughput, respectively.
What problem does this paper attempt to address?