ILP-based Multi-Branch CNNs Mapping on Processing-in-Memory Architecture

Haodong Han,Junpeng Wang,Bo Ding,Song Chen
DOI: https://doi.org/10.1109/aicas59952.2024.10595921
2024-01-01
Abstract:3D-stacked-DRAM-based processing-in-memory (DRAM-PIM) architectures demonstrate benefits in memory access bandwidth and energy efficiency and effectively mitigate the storage wall challenge posed by CNNs. However, DRAM-PIM architectures have a huge mapping space for multi-branch CNNs and inadequate mapping increases the latency of CNNs and the memory requirement of nodes.In this work, we propose an integer linear programming (ILP) method that integrates layer scheduling and resource quantity allocation to minimize overall latency. An ILP-based binding method is introduced to bind layers onto a node array of DRAM-PIM architectures with the maximum memory requirement of nodes reduced. Experimental results demonstrate that our method reduces the latency of branching structures in CNNs and achieves better memory balancing between nodes compared to the baseline method.
What problem does this paper attempt to address?