Determining optimal channel partition for 2:4 fine grained structured sparsity

Mohit Mahajan,Wen-Mei Hwu,Rakesh Nagi
DOI: https://doi.org/10.1007/s11590-023-02084-8
IF: 1.5288
2024-01-13
Optimization Letters
Abstract:Deep Neural Networks (DNNs) have demonstrated tremendous success in many applications, but incur high computational burden on the inference side. The 2:4 sparsity pruning method has recently been developed to effectively compress and accelerate DNNs with little to no loss in performance. The method comprises a training phase followed by a pruning step where 2 out of 4 consecutive weights are eliminated to obtain a pruned matrix, which is then retrained to fine-tune the remaining weights. The accuracy of the resultant sparse network is maximized by permuting the matrix along the channel dimension in a way that maximizes the total magnitude of weights preserved during pruning. While earlier works have proposed heuristic methods to generate good permutations, we formalized the problem as a discrete optimization problem. In this paper, we propose four different mathematical programs to determine the optimal permutations and compare their performance for small-sized instances using a standard solver. Further, we develop a complementary column generation scheme to solve DNNs with realistic number of channels.
mathematics, applied,operations research & management science
What problem does this paper attempt to address?