A Novel Attention‐Based Layer Pruning Approach for Low‐Complexity Convolutional Neural Networks
Bipul Hossain,Na Gong,Mohamed Shaban,Md. Bipul Hossain
DOI: https://doi.org/10.1002/aisy.202400161
IF: 7.298
2024-06-08
Advanced Intelligent Systems
Abstract:Herein, AI‐inspired attention‐based filter and layer pruning methods for extensively reducing the number of learning parameters, memory units, floating‐point operations, and computational time of deep learning (DL) models as compared to the‐state‐of‐the‐art structural pruning techniques are introduced. This facilitates the realization of DL on resource‐constrained edge devices and expedites the analysis of high‐resolution images. Deep learning (DL) has been very successful for classifying images, detecting targets, and segmenting regions in high‐resolution images such as whole slide histopathology images. However, analysis of such high‐resolution images requires very high DL complexity. Several AI optimization techniques have been recently proposed that aim at reducing the complexity of deep neural networks and hence expedite their execution and eventually allow the use of low‐power, low‐cost computing devices with limited computation and memory resources. These methods include parameter pruning and sharing, quantization, knowledge distillation, low‐rank approximation, and resource efficient architectures. Rather than pruning network structures including filters, layers, and blocks of layers based on a manual selection of a significance metric such as l1‐norm and l2‐norm of the filter kernels, novel highly efficient AI‐driven DL optimization algorithms using variations of the squeeze and excitation in order to prune filters and layers of deep models such as VGG‐16 as well as eliminate filters and blocks of residual networks such as ResNet‐56 are introduced. The proposed techniques achieve significantly higher reduction in the number of learning parameters, the number of floating point operations, and memory space as compared to the‐state‐of‐the‐art methods.