Abstracting Sparse DNN Acceleration via Structured Sparse Tensor Decomposition

Geonhwa Jeong,Po-An Tsai,Abhimanyu R. Bambhaniya,Stephen W. Keckler,Tushar Krishna
2024-04-01
Abstract:Exploiting sparsity in deep neural networks (DNNs) has been a promising area to meet the growing computation need of modern DNNs. However, in practice, sparse DNN acceleration still faces a key challenge. To minimize the overhead of sparse acceleration, hardware designers have proposed structured sparse hardware support recently, which provides limited flexibility and requires extra model fine-tuning. Moreover, any sparse model fine-tuned for certain structured sparse hardware cannot be accelerated by other structured hardware. To bridge the gap between sparse DNN models and hardware, this paper proposes tensor approximation via structured decomposition (TASD), which leverages the distributive property in linear algebra to turn any sparse tensor into a series of structured sparse tensors. Next, we develop a software framework, TASDER, to accelerate DNNs by searching layer-wise, high-quality structured decomposition for both weight and activation tensors so that they can be accelerated by any systems with structured sparse hardware support. Evaluation results show that, by exploiting prior structured sparse hardware baselines, our method can accelerate off-the-shelf dense and sparse DNNs without fine-tuning and improves energy-delay-product by up to 83% and 74% on average.
Artificial Intelligence,Hardware Architecture
What problem does this paper attempt to address?
The paper attempts to address the challenges faced when leveraging sparsity to accelerate computation in deep neural networks (DNNs). Specifically, existing sparse DNN acceleration methods have the following issues: 1. **Limited hardware support**: To reduce the overhead of sparse acceleration, hardware designers have proposed structured sparse hardware support. However, this support has low flexibility and requires additional model fine-tuning. 2. **Strong hardware dependency**: Sparse models fine-tuned for specific structured sparse hardware cannot be accelerated on other structured hardware. 3. **Gap between software and hardware**: Existing methods generate sparse models at the software level that do not match the hardware support, leading to performance and energy efficiency losses. To address these issues, the paper proposes a new method—Tensor Approximation via Structured Decomposition (TASD), which can transform any sparse tensor into a series of structured sparse tensors. Additionally, the paper develops a software framework called TASDER to search for the optimal structured decomposition for each layer of the DNN, thereby accelerating DNNs on any system that supports structured sparse hardware. Through this method, the paper aims to: - Provide a flexible interface that allows DNN developers to focus on generating unstructured sparse models without worrying about hardware limitations. - Significantly improve the computational efficiency and energy efficiency of DNNs while maintaining model accuracy. Evaluation results show that the TASD method can accelerate off-the-shelf dense and sparse DNNs without fine-tuning, and improve the energy-delay product (EDP) by an average of 74%, with a maximum improvement of 83%.