Abstract:SSD and YOLOv5 are the one-stage object detector representative algorithms. An improved one-stage object detector based on the YOLOv5 method is proposed in this paper, named Multi-scale Feature Cross-layer Fusion Network (M-FCFN). Firstly, we extract shallow features and deep features from the PANet structure for cross-layer fusion and obtain a feature scale different from 80 × 80, 40 × 40, and 20 × 20 as output. Then, according to the single shot multi-box detector, we propose the different scale features which are obtained by cross-layer fusion for dimension reduction and use it as another output for prediction. Therefore, two completely different feature scales are added as the output. Features of different scales are necessary for detecting objects of different sizes, which can increase the probability of object detection and significantly improve detection accuracy. Finally, aiming at the Autoanchor mechanism proposed by YOLOv5, we propose an EIOU k-means calculation. We have compared the four model structures of S, M, L, and X of YOLOv5 respectively. The problem of missed and false detections for large objects is improved which has better detection results. The experimental results show that our methods achieve 89.1% and 67.8% mAP@0.5 on the PASCAL VOC and MS COCO datasets. Compared with the YOLOv5_S, our methods improve by 4.4% and 1.4% mAP@ [0.5:0.95] on the PASCAL VOC and MS COCO datasets. Compared with the four models of YOLOv5, our methods have better detection accuracy for large objects. It should be more attention that our method on the large-scale mAP@ [0.5:0.95] is 5.4% higher than YOLOv5_S on the MS COCO datasets.

CS-R-FCN: Cross-Supervised Learning for Large-Scale Object Detection

$\mathcal{R}^2$ -CNN: Fast Tiny Object Detection in Large-Scale Remote Sensing Images

Hierarchical Structure and Joint Training for Large Scale Semi-supervised Object Detection

CFCG: Semi-Supervised Semantic Segmentation Via Cross-Fusion and Contour Guidance Supervision

Self-supervised co-salient object detection via feature correspondence at multiple scales

R-FCN plus plus : Towards Accurate Region-Based Fully Convolutional Networks for Object Detection

CSD3D: Cross-Scale Distillation Via Dual-Consistency Learning for Semi-Supervised 3D Object Detection

Efficient Object Region Discovery for Weakly-supervised Semantic Segmentation

An improved YOLOv5 method for large objects detection with multi-scale feature cross-layer fusion network

A Deep CNN-Based Detection Method for Multi-Scale Fine-Grained Objects in Remote Sensing Images

Exploiting Cross-scale Consistency for Object Detection in Aerial Images

A fast self-attention cascaded network for object detection in large scene remote sensing images

Residual Learning for Salient Object Detection

Discriminative Cross-Modal Transfer Learning and Densely Cross-Level Feedback Fusion for RGB-D Salient Object Detection

CaT: Weakly Supervised Object Detection with Category Transfer

Global Guided Cross-Modal Cross-Scale Network for RGB-D Salient Object Detection

S-CNN: Subcategory-Aware Convolutional Networks for Object Detection

Single-Shot Object Detection via Feature Enhancement and Channel Attention

Cross-Scale Feature Propagation Network for Semantic Segmentation of High-Resolution Remote Sensing Images

Dynamic Supervisor for Cross-dataset Object Detection

Augmenting Strong Supervision Using Web Data for Fine-Grained Categorization.