MSSD: multi-scale object detector based on spatial pyramid depthwise convolution and efficient channel attention mechanism
Yipeng Zhou,Huaming Qian,Peng Ding
DOI: https://doi.org/10.1007/s11554-023-01358-9
IF: 2.293
2023-09-03
Journal of Real-Time Image Processing
Abstract:Object detection has made widespread development and remarkable progress in various fields, but, in complex application scenarios, often encounters the situation that the target features are inconspicuous and the scale range is large, making it incapable of achieving the desirable results, especially for small targets. This paper proposes a multi-scale object detector MSSD based on spatial pyramid depthwise convolution (SPDC) and efficient channel attention mechanism (ECAM) from the optimization of SSD. Firstly, use ResNet50 to replace VGG as backbone to obtain more representative features. Secondly, a plug-and-play spatial pyramid depthwise convolution module SPDC is proposed to enhance perceptual field and multi-scale feature extraction capabilities. Furthermore, we design a straightforward efficient channel attention mechanism (ECAM) to scale the weights of features on channels to derive more robust features. Finally, the feature pyramid network (FPN) with ECAM (ECAM-FPN) module is introduced in the prediction feature layer for deep feature fusion to obtain multi-scale features rich in semantic and detail information. For 300 300 input, MSSD achieves 82.5 mAP on PASCAL VOC07+12 dataset at 56 FPS and 48.2 mAP on MS COCO2017 dataset, which are 8.2 and 7.0 higher than SSD(300), respectively. Detection of small targets is improved by 0.8 on COCO and by 6.5 when scaled to 512 512. The proposed method has significant gains in cross-scale target detection while satisfying real time and is comparable with other methods.
computer science, artificial intelligence,engineering, electrical & electronic,imaging science & photographic technology