PIMCOMP: An End-to-End DNN Compiler for Processing-In-Memory Accelerators

Xiaotian Sun,Xinyu Wang,Wanqian Li,Yinhe Han,Xiaoming Chen

2024-11-14

Abstract:Various processing-in-memory (PIM) accelerators based on various devices, micro-architectures, and interfaces have been proposed to accelerate deep neural networks (DNNs). How to deploy DNNs onto PIM-based accelerators is the key to explore PIM's high performance and energy efficiency. The scale of DNN models, the diversity of PIM accelerators, and the complexity of deployment are far beyond the human deployment capability. Hence, an automatic deployment methodology is indispensable. In this work, we propose PIMCOMP, an end-to-end DNN compiler tailored for PIM accelerators, achieving efficient deployment of DNN models on PIM hardware. PIMCOMP can adapt to various PIM architectures by using an abstract configurable PIM accelerator template with a set of pseudo-instructions, which is a high-level abstraction of the hardware's fundamental functionalities. Through a generic multi-level optimization framework, PIMCOMP realizes an end-to-end conversion from a high-level DNN description to pseudo-instructions, which can be further converted to specific hardware intrinsics/primitives. The compilation addresses two critical issues in PIM-accelerated inference from a system perspective: resource utilization and dataflow scheduling. PIMCOMP adopts a flexible unfolding format to reshape and partition convolutional layers, adopts a weight-layout guided computation-storage-mapping approach to enhance resource utilization, and balances the system's computation, memory access, and communication characteristics. For dataflow scheduling, we design two scheduling algorithms with different inter-layer pipeline granularities to support varying application scenarios while ensuring high computational parallelism. Experiments demonstrate that PIMCOMP improves throughput, latency, and energy efficiency across various architectures. PIMCOMP is open-sourced at \url{<a class="link-external link-https" href="https://github.com/sunxt99/PIMCOMP-NN" rel="external noopener nofollow">this https URL</a>}.

Hardware Architecture

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to effectively deploy these models onto in - memory processing accelerators (PIM accelerators) when dealing with deep neural networks (DNNs). Specifically, the paper focuses on the following key challenges: 1. **Resource Utilization and Data - flow Scheduling**: As the scale of DNN models increases and their diversity grows, and with the complexity of the PIM accelerator hardware architecture, it becomes uneconomical and unrealistic to manually design the deployment scheme of each DNN model on each PIM accelerator. Therefore, a compiler that can automatically complete the DNN model deployment is required to improve the availability of PIM accelerators and help build the PIM ecosystem. 2. **Hardware Compatibility**: Different PIM accelerators have different micro - architectures and interfaces, which requires the compiler to be able to adapt to various PIM architectures. This is achieved by using an abstract and configurable PIM accelerator template and a set of pseudo - instructions, which are high - level abstractions of the basic hardware functions. 3. **Software Support**: The compiler should support multiple DNN workloads and be applicable to different application scenarios, such as meeting the requirements of low latency or high throughput. In addition, the compiler should be able to automatically complete model reading, weight mapping, and output collection without user intervention. 4. **System - level Optimization**: The compiler needs to effectively handle resource allocation and data - flow scheduling to unleash the hardware potential. For resource allocation, the compiler should make full use of PIM resources while balancing computation, memory access, and communication; for data - flow scheduling, the compiler should quickly generate instruction streams in different scenarios and optimize system performance bottlenecks. To address these challenges, the paper proposes PIMCOMP, an end - to - end DNN compiler specifically designed for PIM accelerators. PIMCOMP realizes the end - to - end conversion from high - level DNN descriptions to pseudo - instructions through a multi - level optimization framework, and can further be converted into specific hardware - specific instructions/primitives. Experimental results show that PIMCOMP can significantly improve throughput, reduce latency, and energy consumption on various architectures.

PIMCOMP: An End-to-End DNN Compiler for Processing-In-Memory Accelerators

PIMCOMP: A Universal Compilation Framework for Crossbar-based PIM DNN Accelerators

A design framework for processing-in-memory accelerator

Functionality-Based Processing-in-Memory Accelerator for Deep Convolutional Neural Networks

PIMulator-NN: an Event-Driven, Cross-level Simulation Framework for Processing-In-Memory Based Neural Network Accelerators

Instruction Set Architecture (ISA) for Processing-in-Memory DNN Accelerators

NicePIM: Design Space Exploration for Processing-In-Memory DNN Accelerators with 3D-Stacked-DRAM

DyPIM: Dynamic-Inference-Enabled Processing - In-Memory Accelerator

PIM-HLS: An Automatic Hardware Generation Tool for Heterogeneous Processing-In-Memory-based Neural Network Accelerators.

PIMSIM-NN: An ISA-based Simulation Framework for Processing-in-Memory Accelerators

Generalized Ping-Pong: Off-Chip Memory Bandwidth Centric Pipelining Strategy for Processing-In-Memory Accelerators

CMP-PIM: An Energy-Efficient Comparator-based Processing-In-Memory Neural Network Accelerator

EPIM: Efficient Processing-In-Memory Accelerators based on Epitome

SP-PIM: A Super-Pipelined Processing-In-Memory Accelerator With Local Error Prediction for Area/Energy-Efficient On-Device Learning

NeuPIMs: NPU-PIM Heterogeneous Acceleration for Batched LLM Inferencing

PIMSAB: A P Rocessing- I N- M Emory System with S Patially- A Ware Communication and B It-Serial-aware Computation

PIMSYN: Synthesizing Processing-in-memory CNN Accelerators

pPIM: A Programmable Processor-in-Memory Architecture With Precision-Scaling for Deep Learning

GIM: Versatile GNN Acceleration with Reconfigurable Processing-in-Memory

PIMSAB: A Processing-In-Memory System with Spatially-Aware Communication and Bit-Serial-Aware Computation

Accelerating Neural Network Inference with Processing-in-DRAM: From the Edge to the Cloud