Detail injection with heterogeneous composite backbone network for object detection

Zhiwei Yan,Huicheng Zheng,Ye Li
DOI: https://doi.org/10.1007/s11042-022-12241-3
IF: 2.577
2022-01-01
Multimedia Tools and Applications
Abstract:Current detectors usually rely on backbone networks initially designed for image classification and pretrained on large image classification datasets, making them suitable for modeling global information. The consequence is that most detectors struggle to detect small objects due to rapid loss of local spatial details that are critical for accurate localization. In this work, we propose a backbone network, called the heterogeneous composite backbone, which aims to not only utilize deep features generated by the off-the-shelf classification-oriented backbone network for global information extraction, but also benefit from our re-designed detail extraction backbone network that yields features with more detailed spatial information, which is accomplished through joining two backbones with diverse structures. Our new backbone is shown to be beneficial for modeling fine-grained local information. Furthermore, to guarantee that the features from the randomly initialized detail extraction network are not suppressed in the end-to-end training process, we explore a new training scheme that combines features from a pretrained deep backbone and features generated by a network trained nearly from scratch. We carry out experiments on benchmark datasets including PASCAL VOC and MS COCO, which demonstrate that the proposed backbone network can achieve considerable improvements in object detection.
What problem does this paper attempt to address?