Accelerating deep neural network filter pruning with mask-aware convolutional computations on modern CPUs

Xiu Ma,Guangli Li,Lei Liu,Huaxiao Liu,Xueying Wang
DOI: https://doi.org/10.1016/j.neucom.2022.07.006
IF: 6
2022-09-21
Neurocomputing
Abstract:Filter pruning, a representative model compression technique, has been widely used to compress and accelerate sophisticated deep neural networks on resource-constrained platforms. Nevertheless, most studies focus on reducing the cost of model inference, whereas the heavy burden of the pruning optimization process is neglected. In this paper, we propose MaskACC, a mask-aware convolutional computation method, which accelerates the prevailing mask-based filter pruning process on modern CPU platforms. MaskACC dynamically reorganizes the tensors used in convolutions with the mask information to avoid unnecessary computations, thereby improving the computational efficiency of the pruning process. Evaluation with state-of-the-art neural network models on CPU cloud platforms demonstrates the effectiveness of our method, which achieves up to 1.61× speedup under commonly-used pruning rates, compared to conventional computations.
computer science, artificial intelligence
What problem does this paper attempt to address?