ParaLkResNet: an efficient multi-scale image classification network

Tongshuai Yu,Ye Liu,Hao Liu,Ji Chen,Xing Wang
DOI: https://doi.org/10.1007/s00371-024-03508-x
IF: 2.835
2024-06-14
The Visual Computer
Abstract:Recently, deep neural networks have achieved remarkable results in computer vision tasks with the widely used visual attention mechanism. However, the introduction of the visual attention mechanism increases the parameters and computational complexity, which limit its application in resource-constrained environments. To solve this problem, we propose a novel convolutional block, the ParaLk block (PLB), a large kernel parallel convolutional block. Additionally, we apply PLB to PreActResNet by replacing the first 2D convolution to capture feature maps at different scales and call this new network ParaLkResNet. In practice, the effective receptive field of a convolutional network is smaller than that in real-world computation. Therefore the PLB is used to increase the receptive field of the network. Besides extracting multi-scale and high fusion features over normal 2D convolution, it has low latency in typical downstream tasks and good scalability to different data. It is worth noting that PLB as a plug-in block can apply to various computer vision tasks not limited to image classification. The proposed method outperforms most current classification networks in image classification. The accuracy on the CIFAR-10 dataset is improved by 2.42% and 0.66% compared to OTTT and IM-Loss, respectively. Our source code is available at: https://doi.org/10.5281/zenodo.11204902.
computer science, software engineering
What problem does this paper attempt to address?