Abstract:Network sparsification serves as an effective technique to accelerate Deep Neural Network (DNN) inference. However, existing sparsification techniques often rely on structured sparsity, which yields limited benefits. This is primarily due to the significant memory and computational overhead introduced by numerous sparse storage formats during address generation and gradient updates. Additionally, many of these solutions are tailored solely for the inference phase, neglecting the crucial training phase. In this paper, we introduce STCO, a novel Sparse Tensor Compilation Optimization technique that significantly enhances training efficiency through structured sparse tensor compilation. Central to STCO is the Tensorization-aware Index Entity (TIE) format, which effectively represents structured sparse tensors by eliminating redundant indices and minimizing storage overhead. The TIE format plays a pivotal role in the Address-carry flow (AC flow) pass, which optimizes the data layout at the computational graph level. This pass leverages the TIE format to enhance the efficiency of tensor representations, enabling more compact and efficient sparse tensor storage. Meanwhile, a shape inference pass utilizes the AC flow to derive optimized tensor shapes, further refining the performance of sparse tensor operations. Moreover, the Address-Carry TIE Flow dynamically tracks nonzero addresses, extending the benefits of sparse optimization to both forward and backward propagation. This seamless integration into the training pipeline enables a smooth transition to sparse tensor compilation without significant modifications to existing codebases. To further boost training performance, we implement an operator-level AC flow optimization pass tailored for structured sparse tensors. This pass generates efficient addresses, ensuring minimal computational overhead during sparse tensor operations. The flexibility of STCO allows it to be efficiently integrated into various frameworks or compilers, providing a robust solution for enhancing training efficiency with structured sparse tensors. Experiments demonstrated that STCO achieved impressive speedups of 3.64 ×, 5.43 ×, 4.89 ×, and 3.91 × when compared to state-of-the-art sparse formats on VGG16, ResNet-18, MobileNetV1, and MobileNetV2, respectively. These findings underscore the efficiency and superiority of our proposed approach in leveraging unstructured sparsity for Deep Neural Network inference acceleration.

Taming Unstructured Sparsity on GPUs Via Latency-Aware Optimization

Accelerating Sparse DNN Models Without Hardware-Support Via Tile-Wise Sparsity

TSTC: Two-Level Sparsity Tensor Core Enabling Both Algorithm Flexibility and Hardware Efficiency

Releasing the Potential of Tensor Core for Unstructured SpMM Using Tiled-CSR Format

Performance of Training Sparse Deep Neural Networks on GPUs

Shfl-BW: Accelerating Deep Neural Network Inference with Tensor-Core Aware Weight Pruning

Balanced Sparsity for Efficient DNN Inference on GPU

DSTC: Dual-Side Sparsity Tensor Core for DNNs Acceleration on Modern GPU Architectures

Accelerating Sparse DNNs Based on Tiled GEMM

Accelerating Sparse Deep Neural Network Inference Using GPU Tensor Cores

Dual-side Sparse Tensor Core

Accelerating Sparse Graph Neural Networks with Tensor Core Optimization

TC-GNN: Bridging Sparse GNN Computation and Dense Tensor Cores on GPUs

TSTC: Enabling Efficient Training Via Structured Sparse Tensor Compilation

STCO: Enhancing Training Efficiency Via Structured Sparse Tensor Compilation Optimization

Abstracting Sparse DNN Acceleration via Structured Sparse Tensor Decomposition

Performance Modeling and Optimization of Sparse Matrix-Vector Multiplication on NVIDIA CUDA Platform

Optimizing the Linear Fascicle Evaluation Algorithm for Multi-Core and Many-Core Systems

Accelerating Deep Neural Networks via Semi-Structured Activation Sparsity

Tensor Core-Adapted Sparse Matrix Multiplication for Accelerating Sparse Deep Neural Networks

Structured Term Pruning for Computational Efficient Neural Networks Inference