MFIL-FCOS: A Multi-Scale Fusion and Interactive Learning Method for 2D Object Detection and Remote Sensing Image Detection

Guoqing Zhang,Wenyu Yu,Ruixia Hou
DOI: https://doi.org/10.3390/rs16060936
IF: 5
2024-03-08
Remote Sensing
Abstract:Object detection is dedicated to finding objects in an image and estimate their categories and locations. Recently, object detection algorithms suffer from a loss of semantic information in the deeper feature maps due to the deepening of the backbone network. For example, when using complex backbone networks, existing feature fusion methods cannot fuse information from different layers effectively. In addition, anchor-free object detection methods fail to accurately predict the same object due to the different learning mechanisms of the regression and centrality of the prediction branches. To address the above problem, we propose a multi-scale fusion and interactive learning method for fully convolutional one-stage anchor-free object detection, called MFIL-FCOS. Specifically, we designed a multi-scale fusion module to address the problem of local semantic information loss in high-level feature maps which strengthen the ability of feature extraction by enhancing the local information of low-level features and fusing the rich semantic information of high-level features. Furthermore, we propose an interactive learning module to increase the interactivity and more accurate predictions by generating a centrality-position weight adjustment regression task and a centrality prediction task. Following these strategic improvements, we conduct extensive experiments on the COCO and DIOR datasets, demonstrating its superior capabilities in 2D object detection tasks and remote sensing image detection, even under challenging conditions.
environmental sciences,imaging science & photographic technology,remote sensing,geosciences, multidisciplinary
What problem does this paper attempt to address?
The problem that this paper attempts to solve is in 2D object detection in complex backgrounds and remote - sensing image detection. In existing object detection algorithms, there are problems of semantic information loss in deep - level feature maps and inaccurate prediction caused by different learning mechanisms of regression and centrality prediction branches when predicting the same object in anchor - free object detection methods. Specifically: 1. **Semantic information loss**: As the backbone network deepens, existing feature - fusion methods cannot effectively fuse information from different layers, resulting in the loss of local semantic information in deep - level feature maps. 2. **Limitations of anchor - free detection methods**: Anchor - free object detection methods are difficult to accurately predict the position of the same object due to different learning mechanisms of regression and centrality prediction branches. To address these problems, the authors propose a multi - scale fusion and interactive learning method named MFIL - FCOS for fully convolutional single - stage anchor - free object detection. The main innovations of this method include: - **Multi - scale fusion module**: A multi - scale fusion module is designed. By enhancing the local information of low - level features and fusing the rich semantic information of high - level features, the problem of local semantic information loss in high - dimensional feature maps is solved. - **Interactive learning module**: An interactive learning module is proposed. By generating central - position weights to adjust regression tasks and centrality prediction tasks, the interactivity and prediction accuracy of the model are increased. These improvements enable the model to perform well in 2D object detection tasks and remote - sensing image detection tasks on the COCO and DIOR datasets, and can achieve good results even under challenging conditions.