MFIL-FCOS: A Multi-Scale Fusion and Interactive Learning Method for 2D Object Detection and Remote Sensing Image Detection

Guoqing Zhang,Wenyu Yu,Ruixia Hou

DOI: https://doi.org/10.3390/rs16060936

IF: 5

2024-03-08

Remote Sensing

Abstract:Object detection is dedicated to finding objects in an image and estimate their categories and locations. Recently, object detection algorithms suffer from a loss of semantic information in the deeper feature maps due to the deepening of the backbone network. For example, when using complex backbone networks, existing feature fusion methods cannot fuse information from different layers effectively. In addition, anchor-free object detection methods fail to accurately predict the same object due to the different learning mechanisms of the regression and centrality of the prediction branches. To address the above problem, we propose a multi-scale fusion and interactive learning method for fully convolutional one-stage anchor-free object detection, called MFIL-FCOS. Specifically, we designed a multi-scale fusion module to address the problem of local semantic information loss in high-level feature maps which strengthen the ability of feature extraction by enhancing the local information of low-level features and fusing the rich semantic information of high-level features. Furthermore, we propose an interactive learning module to increase the interactivity and more accurate predictions by generating a centrality-position weight adjustment regression task and a centrality prediction task. Following these strategic improvements, we conduct extensive experiments on the COCO and DIOR datasets, demonstrating its superior capabilities in 2D object detection tasks and remote sensing image detection, even under challenging conditions.

environmental sciences,imaging science & photographic technology,remote sensing,geosciences, multidisciplinary

What problem does this paper attempt to address?

The problem that this paper attempts to solve is in 2D object detection in complex backgrounds and remote - sensing image detection. In existing object detection algorithms, there are problems of semantic information loss in deep - level feature maps and inaccurate prediction caused by different learning mechanisms of regression and centrality prediction branches when predicting the same object in anchor - free object detection methods. Specifically: 1. **Semantic information loss**: As the backbone network deepens, existing feature - fusion methods cannot effectively fuse information from different layers, resulting in the loss of local semantic information in deep - level feature maps. 2. **Limitations of anchor - free detection methods**: Anchor - free object detection methods are difficult to accurately predict the position of the same object due to different learning mechanisms of regression and centrality prediction branches. To address these problems, the authors propose a multi - scale fusion and interactive learning method named MFIL - FCOS for fully convolutional single - stage anchor - free object detection. The main innovations of this method include: - **Multi - scale fusion module**: A multi - scale fusion module is designed. By enhancing the local information of low - level features and fusing the rich semantic information of high - level features, the problem of local semantic information loss in high - dimensional feature maps is solved. - **Interactive learning module**: An interactive learning module is proposed. By generating central - position weights to adjust regression tasks and centrality prediction tasks, the interactivity and prediction accuracy of the model are increased. These improvements enable the model to perform well in 2D object detection tasks and remote - sensing image detection tasks on the COCO and DIOR datasets, and can achieve good results even under challenging conditions.

MFIL-FCOS: A Multi-Scale Fusion and Interactive Learning Method for 2D Object Detection and Remote Sensing Image Detection

Dynamic Convolution Covariance Network Using Multi-Scale Feature Fusion for Remote Sensing Scene Image Classification

An Adaptive Attention Fusion Mechanism Convolutional Network for Object Detection in Remote Sensing Images

Multiscale Semantic Fusion-Guided Fractal Convolutional Object Detection Network for Optical Remote Sensing Imagery

Adaptively Attentional Feature Fusion Oriented to Multiscale Object Detection in Remote Sensing Images

ℱ3-Net: Feature Fusion and Filtration Network for Object Detection in Optical Remote Sensing Images

Improving Object Detection in YOLOv8n with the C2f-f Module and Multi-Scale Fusion Reconstruction

MMYFnet: Multi-Modality YOLO Fusion Network for Object Detection in Remote Sensing Images

A Multi-Feature Fusion and Attention Network for Multi-Scale Object Detection in Remote Sensing Images

Object Detection of Remote Sensing Image Based on Multi-Scale Feature Fusion and Attention Mechanism

A Multiscale Information Fusion Network Based on PixelShuffle Integrated With YOLO for Aerial Remote Sensing Object Detection

Cascaded Cross-Modality Fusion Network for 3D Object Detection

MLF-DET: Multi-Level Fusion for Cross-Modal 3D Object Detection

Oriented Object Detection Based on Cross-Scale Information Fusion

A Task-Balanced Multiscale Adaptive Fusion Network for Object Detection in Remote Sensing Images

Learning Adaptive Fusion Bank for Multi-modal Salient Object Detection

MMAF-Net: Multi-view multi-stage adaptive fusion for multi-sensor 3D object detection

ACDF-YOLO: Attentive and Cross-Differential Fusion Network for Multimodal Remote Sensing Object Detection

SFSANet: Multiscale Object Detection in Remote Sensing Image Based on Semantic Fusion and Scale Adaptability

Multi-Modal Object Detection Method Based on Dual-Branch Asymmetric Attention Backbone and Feature Fusion Pyramid Network

MSF-YOLO: A multi-scale features fusion-based method for small object detection