SIS: A new multi-scale convolutional operator

Man Zhou,Xueyang Fu,Aiping Liu
DOI: https://doi.org/10.52396/JUSTC-2021-0188
2022-01-01
Journal of University of Science and Technology of China
Abstract:Visual features with high potential for generalization are critical for computer vision applications. In addition to the computational overhead associated with layer-by-layer feature stacking to produce multi-scale feature maps, existing approaches also incur high computational costs. To address this issue, we present a compact and efficient scale-in-scale convolution operator called SIS by incorporating an efficient progressive multi-scale architecture into a standard convolu-tion operator. More precisely, the suggested operator uses the channel transform-divide-and-conquer technique to optim-ize conventional channel-wise computing, thereby lowering the computational cost while simultaneously expanding the re-ceptive fields within a single convolution layer. Moreover, the proposed SIS operator incorporates weight-sharing with split-and-interact and recur-and-fuse mechanisms for enhanced variant design. The suggested SIS series is easily plug-gable into any promising convolutional backbone, such as the well-known ResNet and Res2Net. Furthermore, we incor-porated the proposed SIS operator series into 29-layer, 50-layer, and 101-layer ResNet as well as Res2Net variants and evaluated these modified models on the widely used CIFAR, PASCAL VOC, and COCO2017 benchmark datasets, where they consistently outperformed state-of-the-art models on a variety of major vision tasks, including image classification, key point estimation, semantic segmentation, and object detection.
What problem does this paper attempt to address?