Abstract:In this paper, an improved detector MRTMDet, is proposed to overcome the complex backgrounds noise and large scale‐variations challenge for oriented object detection in remote sensing images by designing innovative feature extraction network and feature fusion network. These networks integrate a lightweight vision transformer and a multi‐scale feature extraction module in different structures, thereby enhancing the overall quality of feature representation and the effectiveness in understanding and predicting tasks and further augmenting the model's ability to perceive both global features and multi‐scale features. The authors set the ablation and comparison experiments on the publicly available dataset DIOR‐R which show the model achieves excellent comprehensive performance and is well‐balanced with precision and lightweight. Object detection in remote sensing images aims to interpret images to obtain information on the category and location of potential targets, which is of great importance in traffic detection, marine supervision, and space reconnaissance. However, the complex backgrounds and large scale variations in remote sensing images present significant challenges. Traditional methods relied mainly on image filtering or feature descriptor methods to extract features, resulting in underperformance. Deep learning methods, especially one‐stage detectors, for example, the Real‐Time Object Detector (RTMDet) offers advanced solutions with efficient network architectures. Nevertheless, difficulty in feature extraction from complex backgrounds and target localisation in scale variations images limits detection accuracy. In this paper, an improved detector based on RTMDet, called the Multi‐Scale Feature Extraction‐assist RTMDet (MRTMDet), is proposed which address limitations through enhancement feature extraction and fusion networks. At the core of MRTMDet is a new backbone network MobileViT++ and a feature fusion network SFC‐FPN, which enhances the model's ability to capture global and multi‐scale features by carefully designing a hybrid feature processing unit of CNN and a transformer based on vision transformer (ViT) and poly‐scale convolution (PSConv), respectively. The experiment in DIOR‐R demonstrated that MRTMDet achieves competitive performance of 62.2% mAP, balancing precision with a lightweight design.

A Multi-Scale Target Detection Method Using an Improved Faster Region Convolutional Neural Network Based on Enhanced Backbone and Optimized Mechanisms

Multi-scale Convolution Target Detection Algorithm with Feature Pyramid

A Maritime Targets Detection Method Based on Hierarchical and Multi-Scale Deep Convolutional Neural Network

A Deep CNN-Based Detection Method for Multi-Scale Fine-Grained Objects in Remote Sensing Images

CF2PN: A Cross-Scale Feature Fusion Pyramid Network Based Remote Sensing Target Detection

A Multi-Scale Detector Based on Attention Mechanism

Multitarget Detection in Depth-Perception Traffic Scenarios

A Multi-Target Detection Method on Distribution Cabinet Based on Improved Faster R-CNN

A Small Target Detection Method Based on Feature Enhancement and Positioning Optimization

MSF-CSPNet: A Specially Designed Backbone Network for Faster R-CNN

Target detection for remote sensing based on the enhanced YOLOv4 with improved BiFPN

Multi-Scale Feature Fusion Convolutional Neural Network for Indoor Small Target Detection

FAFFENet: frequency attention and feature fusion enhancement network for multiscale remote sensing target detection

An Improved Faster R-CNN for Small Object Detection

Multi‐scale feature extraction for energy‐efficient object detection in remote sensing images

A method of cross-layer fusion multi-object detection and recognition based on improved faster R-CNN model in complex traffic environment

Object Detection of Remote Sensing Image Based on Multi-Scale Feature Fusion and Attention Mechanism

MTA-Net: A One-Stage Detector Based on a Multiscale Task-Aligned Network for Catenary Support Components

Structured Object-Level Relational Reasoning CNN-Based Target Detection Algorithm in a Remote Sensing Image

MFCANet: Multiscale Feature Context Aggregation Network for Oriented Object Detection in Remote-Sensing Images