Abstract:Deep neural networks (DNNs) have recently gained significant prominence in various real-world applications such as image recognition, natural language processing, and autonomous vehicles. However, due to their black-box nature in system, the underlying mechanisms of DNNs behind the inference results remain opaque to users. In order to address this challenge, researchers have focused on developing explainable artificial intelligence (AI) algorithms. Explainable AI aims to provide a clear and human-understandable explanation of the model's decision, thereby building more reliable systems. However, the explanation task differs from well-known inference and training processes as it involves interactions with the user. Consequently, existing inference and training accelerators face inefficiencies when processing explainable AI on edge devices. This article introduces explainable processing unit (EPU), the first hardware accelerator designed for explainable AI workloads. The EPU utilizes a novel data compression format for the output heat maps and intermediate gradients to enhance the overall system performance by reducing both memory footprint and external memory access. Its sparsity-free computing core efficiently handles the input sparsity with negligible control overhead, resulting in a throughput boost of up to 9.48× It also proposes a dynamic workload scheduling with a customized ON-chip network for distinct inference and explanation tasks to maximize internal data reuse hence reducing external memory access by 63.7%. Furthermore, the EPU incorporates point-wise gradient pruning (PGP) that can significantly reduce the size of heat maps by a factor of 7.01× combined with the proposed compression format. Finally, the EPU chip fabricated in a 28 nm CMOS process achieves a remarkable heat map generation rate of 367 frames/s for ResNet-34 while maintaining the state-of-the-art area and energy efficiency of 112.3 GOPS/mm2 and 26.55 TOPS/W, respectively.

NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units

Hardware-Assisted Virtualization of Neural Processing Units for Cloud Platforms

UIC: A Unified and Scalable Chip Integrating Neuromorphic Computation and General Purpose Processor

HuNT: Exploiting Heterogeneous PIM Devices to Design a 3-D Manycore Architecture for DNN Training

NeuPIMs: NPU-PIM Heterogeneous Acceleration for Batched LLM Inferencing

Hardware Memory Management for Future Mobile Hybrid Memory Systems

A heterogeneous computing system with memristor-based neuromorphic accelerators

Dynamically Reconfigurable Memory Address Mapping for General-Purpose Graphics Processing Unit.

Accelerating Neural Network Inference with Processing-in-DRAM: From the Edge to the Cloud

Analog CMOS-based Resistive Processing Unit for Deep Neural Network Training

G-NMP: Accelerating Graph Neural Networks with DIMM-based Near-Memory Processing

In-depth analyses of unified virtual memory system for GPU accelerated computing

GPUVM: GPU-driven Unified Virtual Memory

EPU: An Energy-Efficient Explainable AI Accelerator With Sparsity-Free Computation and Heat Map Compression/Pruning

MPU: Towards Bandwidth-abundant SIMT Processor via Near-bank Computing

Memory Access Optimization of a Neural Network Accelerator Based on Memory Controller

A Precision-Optimized Fixed-Point Near-Memory Digital Processing Unit for Analog In-Memory Computing

Special Topic on Nonvolatile Memory for Efficient Implementation of Neural/Neuromorphic Computing

NEUTRAMS: Neural Network Transformation and Co-Design under Neuromorphic Hardware Constraints

PIM-MMU: A Memory Management Unit for Accelerating Data Transfers in Commercial PIM Systems

UNPU: An Energy-Efficient Deep Neural Network Accelerator With Fully Variable Weight Bit Precision