Abstracting Sparse DNN Acceleration via Structured Sparse Tensor Decomposition

Geonhwa Jeong,Po-An Tsai,Abhimanyu R. Bambhaniya,Stephen W. Keckler,Tushar Krishna

2024-04-01

Abstract:Exploiting sparsity in deep neural networks (DNNs) has been a promising area to meet the growing computation need of modern DNNs. However, in practice, sparse DNN acceleration still faces a key challenge. To minimize the overhead of sparse acceleration, hardware designers have proposed structured sparse hardware support recently, which provides limited flexibility and requires extra model fine-tuning. Moreover, any sparse model fine-tuned for certain structured sparse hardware cannot be accelerated by other structured hardware. To bridge the gap between sparse DNN models and hardware, this paper proposes tensor approximation via structured decomposition (TASD), which leverages the distributive property in linear algebra to turn any sparse tensor into a series of structured sparse tensors. Next, we develop a software framework, TASDER, to accelerate DNNs by searching layer-wise, high-quality structured decomposition for both weight and activation tensors so that they can be accelerated by any systems with structured sparse hardware support. Evaluation results show that, by exploiting prior structured sparse hardware baselines, our method can accelerate off-the-shelf dense and sparse DNNs without fine-tuning and improves energy-delay-product by up to 83% and 74% on average.

Artificial Intelligence,Hardware Architecture

What problem does this paper attempt to address?

The paper attempts to address the challenges faced when leveraging sparsity to accelerate computation in deep neural networks (DNNs). Specifically, existing sparse DNN acceleration methods have the following issues: 1. **Limited hardware support**: To reduce the overhead of sparse acceleration, hardware designers have proposed structured sparse hardware support. However, this support has low flexibility and requires additional model fine-tuning. 2. **Strong hardware dependency**: Sparse models fine-tuned for specific structured sparse hardware cannot be accelerated on other structured hardware. 3. **Gap between software and hardware**: Existing methods generate sparse models at the software level that do not match the hardware support, leading to performance and energy efficiency losses. To address these issues, the paper proposes a new method—Tensor Approximation via Structured Decomposition (TASD), which can transform any sparse tensor into a series of structured sparse tensors. Additionally, the paper develops a software framework called TASDER to search for the optimal structured decomposition for each layer of the DNN, thereby accelerating DNNs on any system that supports structured sparse hardware. Through this method, the paper aims to: - Provide a flexible interface that allows DNN developers to focus on generating unstructured sparse models without worrying about hardware limitations. - Significantly improve the computational efficiency and energy efficiency of DNNs while maintaining model accuracy. Evaluation results show that the TASD method can accelerate off-the-shelf dense and sparse DNNs without fine-tuning, and improve the energy-delay product (EDP) by an average of 74%, with a maximum improvement of 83%.

Abstracting Sparse DNN Acceleration via Structured Sparse Tensor Decomposition

Accelerating Sparse DNNs Based on Tiled GEMM

Accelerating Sparse DNN Models Without Hardware-Support Via Tile-Wise Sparsity

TSTC: Two-Level Sparsity Tensor Core Enabling Both Algorithm Flexibility and Hardware Efficiency

Performance of Training Sparse Deep Neural Networks on GPUs

TSTC: Enabling Efficient Training Via Structured Sparse Tensor Compilation

Efficient Processing of Sparse Tensor Decomposition via Unified Abstraction and PE-Interactive Architecture

DSTC: Dual-Side Sparsity Tensor Core for DNNs Acceleration on Modern GPU Architectures

Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network Training

HighLight: Efficient and Flexible DNN Acceleration with Hierarchical Structured Sparsity

Efficient N:M Sparse DNN Training Using Algorithm, Architecture, and Dataflow Co-Design

Accelerating Deep Neural Networks via Semi-Structured Activation Sparsity

StructADMM: A Systematic, High-Efficiency Framework of Structured Weight Pruning for DNNs

Exploiting Dynamic Bit Sparsity in Activation for Deep Neural Network Acceleration.

Hardware Accelerator Design for Sparse DNN Inference and Training: A Tutorial

Taming Unstructured Sparsity on GPUs Via Latency-Aware Optimization

STCO: Enhancing Training Efficiency Via Structured Sparse Tensor Compilation Optimization

HASS: Hardware-Aware Sparsity Search for Dataflow DNN Accelerator

An Algorithm-Hardware Co-Optimized Framework for Accelerating N:M Sparse Transformers

SparseNN: A Performance-Efficient Accelerator for Large-Scale Sparse Neural Networks

An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs