Abstract:Detecting small objects in drone imagery is challenging due to low resolution and background blending, leading to limited feature information. Multiscale feature fusion can enhance detection by capturing information at different scales, but traditional strategies fall short. Simple concatenation or addition operations do not fully utilize multiscale fusion advantages, resulting in insufficient correlation between features. This inadequacy hinders the detection of small objects, especially in complex backgrounds and densely populated areas. To address this issue and efficiently utilize the limited computational resources, we propose a lightweight fusion strategy based on enhanced interlayer feature correlation (EFC) to replace the traditional feature fusion strategy in feature pyramid network (FPN). The semantic expressions of different layers in the feature pyramid are inconsistent. In EFC, the grouped feature focus unit (GFF) enhances the feature correlation of each layer by focusing on the contextual information of different features. The multilevel feature reconstruction module (MFR) effectively reconstructs and transforms the strength and weakness information of each layer in the pyramid to reduce redundant feature fusion and retain more information about small targets in deep networks. It is noteworthy that the proposed method is plug-and-play and can be widely applied to various base networks. Extensive experiments and comprehensive evaluations on VisDrone, unmanned aerial vehicle benchmark object detection and tracking (UAVDT), and microsoft common objects in context (COCO) demonstrate the effectiveness. Using generalized focal loss (GFL) as the baseline on the VisDrone dataset with a large number of small targets, the proposed method improves the detection mean average precision (mAP) by 1.7%, surpassing many lightweight state-of-the-art methods and significantly reducing the Params and GFLOPs at the neck end. The code will be available at https://github.com/nuliweixiao/EFC.git.

MFLFC:Multi-Frame Fusion Based Low-Resolution Feature Compression for Object Tracking

Multilevel Spatial-Temporal Feature Aggregation for Video Object Detection

Residual based hierarchical feature compression for multi-task machine vision

Video object matching across multiple non-overlapping camera views based on multi-feature fusion and incremental learning.

FVC: A New Framework Towards Deep Video Compression in Feature Space

FVC: An End-to-End Framework Towards Deep Video Compression in Feature Space

Learnt Mutual Feature Compression for Machine Vision

End-to-End Learnable Multi-Scale Feature Compression for VCM

End-to-End Learned Scalable Multilayer Feature Compression for Machine Vision Tasks

Rate-Performance-Loss Optimization for Inter-Frame Deep Feature Coding from Videos

Multi-Channel Fused Lasso for Motion Detection in Dynamic Video Scenarios

High Efficiency Deep-learning Based Video Compression

MFACNet: A Multi-Frame Feature Aggregating and Inter-Feature Correlation Framework for Multi-Object Tracking in Satellite Videos

Spatial-Temporal Transformer based Video Compression Framework

A Weight-adaptive Algorithm of Multi Feature Fusion Based on Kernel Correlation Filtering for Target Tracking.

DMVC: Multi-Camera Video Compression Network aimed at Improving Deep Learning Accuracy

MMF-Track: Multi-modal Multi-level Fusion for 3D Single Object Tracking

A Lightweight Fusion Strategy With Enhanced Interlayer Feature Correlation for Small Object Detection

Multi-object tracking via deep feature fusion and association analysis

A Lightweight Fusion Strategy with Enhanced Inter-layer Feature Correlation for Small Object Detection

Adaptive Features Fusion Correlation Filter for Real-time Object Tracking