Composite Backbone Small Object Detection Based on Context and Multi-Scale Information with Attention Mechanism

Xinhan Jing,Xuesong Liu,Baolin Liu
DOI: https://doi.org/10.3390/math12050622
IF: 2.4
2024-02-20
Mathematics
Abstract:Object detection has gained widespread application across various domains; nevertheless, small object detection still presents numerous challenges due to the inherent limitations of small objects, such as their limited resolution and susceptibility to interference from neighboring elements. To improve detection accuracy of small objects, this study presents a novel method that integrates context information, attention mechanism, and multi-scale information. First, to realize feature augmentation, a composite backbone network is employed which can jointly extract object features. On this basis, to efficiently incorporate context information and focus on key features, the composite dilated convolution and attention module (CDAM) is designed, consisting of a composite dilated convolution module (CDM) and convolutional block attention module (CBAM). Then, a feature elimination module (FEM) is introduced to reduce the feature proportion of medium and large objects on feature layers; the impact of neighboring objects on small object detection can thereby be mitigated. Experiments conducted on MS COCO validate the superior performance of the method compared with baseline detectors, while it yields an average enhancement of 0.8% in overall detection accuracy, with a notable enhancement of 2.7% in small object detection.
mathematics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges faced by small - object detection in the field of computer vision. Specifically, due to their limited resolution and vulnerability to interference from neighboring elements, small objects are difficult to extract sufficient feature information during the detection process. In addition, existing detectors rely on anchor boxes and fixed thresholds to classify proposed regions during training, which can lead to an uneven distribution of positive and negative samples for different - sized objects, resulting in a smaller number of positive samples for small objects. Detectors are more likely to detect other larger targets and ignore small ones. To solve these problems, this paper proposes a new method that combines context information, attention mechanisms, and multi - scale information to improve the accuracy of small - object detection. The main contributions include: 1. **Composite backbone network architecture**: Extract and fuse features simultaneously through two backbone networks to obtain more available features, thereby improving detection accuracy. 2. **Composite dilated convolution and attention module (CDAM)**: This module uses dilated convolutions with different dilation rates to fuse shallow - level feature maps, effectively combining context information and enhancing detection performance. 3. **Feature elimination module (FEM)**: This module highlights the features of small objects by reducing the influence of medium - and large - sized objects on shallow - level feature maps, alleviating the detection problems caused by sample imbalance. Through these innovations, experiments on the MS COCO dataset in the paper have verified the superior performance of this method compared to the baseline detector. The overall detection accuracy has been improved by an average of 0.8%, and the small - object detection accuracy has been significantly improved by 2.7%.