Abstract:Various processing-in-memory (PIM) accelerators based on various devices, micro-architectures, and interfaces have been proposed to accelerate deep neural networks (DNNs). How to deploy DNNs onto PIM-based accelerators is the key to explore PIM's high performance and energy efficiency. The scale of DNN models, the diversity of PIM accelerators, and the complexity of deployment are far beyond the human deployment capability. Hence, an automatic deployment methodology is indispensable. In this work, we propose PIMCOMP, an end-to-end DNN compiler tailored for PIM accelerators, achieving efficient deployment of DNN models on PIM hardware. PIMCOMP can adapt to various PIM architectures by using an abstract configurable PIM accelerator template with a set of pseudo-instructions, which is a high-level abstraction of the hardware's fundamental functionalities. Through a generic multi-level optimization framework, PIMCOMP realizes an end-to-end conversion from a high-level DNN description to pseudo-instructions, which can be further converted to specific hardware intrinsics/primitives. The compilation addresses two critical issues in PIM-accelerated inference from a system perspective: resource utilization and dataflow scheduling. PIMCOMP adopts a flexible unfolding format to reshape and partition convolutional layers, adopts a weight-layout guided computation-storage-mapping approach to enhance resource utilization, and balances the system's computation, memory access, and communication characteristics. For dataflow scheduling, we design two scheduling algorithms with different inter-layer pipeline granularities to support varying application scenarios while ensuring high computational parallelism. Experiments demonstrate that PIMCOMP improves throughput, latency, and energy efficiency across various architectures. PIMCOMP is open-sourced at \url{<a class="link-external link-https" href="https://github.com/sunxt99/PIMCOMP-NN" rel="external noopener nofollow">this https URL</a>}.

A Performance-driven Neural Network Compiler for Multi-core Computing-In-Memory Accelerator.

A Robust 8-Bit Non-Volatile Computing-in-Memory Core for Low-Power Parallel MAC Operations.

A design framework for processing-in-memory accelerator

Benchmark of the Compute-in-Memory-Based DNN Accelerator With Area Constraint

Polyhedral-Based Compilation Framework for In-Memory Neural Network Accelerators

DNN+NeuroSim V2.0: An End-to-End Benchmarking Framework for Compute-in-Memory Accelerators for On-chip Training

On Designing Efficient and Reliable Nonvolatile Memory-Based Computing-In-Memory Accelerators

A heterogeneous computing system with memristor-based neuromorphic accelerators

A Heterogeneous Microprocessor for Intermittent AI Inference Using Nonvolatile-SRAM-based Compute-In-Memory

An Energy-Efficient Computing-in-Memory NN Processor with Set-Associate Blockwise Sparsity and Ping-Pong Weight Update

CIMQ: A Hardware-Efficient Quantization Framework for Computing-In-Memory Based Neural Network Accelerators

CMP-PIM: An Energy-Efficient Comparator-based Processing-In-Memory Neural Network Accelerator

PIMSIM-NN: An ISA-based Simulation Framework for Processing-in-Memory Accelerators

PIMulator-NN: an Event-Driven, Cross-level Simulation Framework for Processing-In-Memory Based Neural Network Accelerators

PIMCOMP: An End-to-End DNN Compiler for Processing-In-Memory Accelerators

A Reconfigurable Computing-in-Memory Accelerator with Dynamic Group-Based Dataflow and Dual-Input Macro Designs

PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR

Bridging the Gap Between Neural Networks and Neuromorphic Hardware with A Neural Network Compiler

An Emerging NVM CIM Accelerator with Shared-Path Transpose Read and Bit-Interleaving Weight Storage for Efficient On-Chip Training in Edge Devices

A Low-Power Charge-Domain Bit-Scalable Readout System for Fully-Parallel Computing-in-Memory Accelerators

A Non-Volatile Computing-In-Memory Framework with Margin Enhancement Based CSA and Offset Reduction Based ADC.