Abstract:Deep learning-based object detection methods often grapple with excessive model parameters, high complexity, and subpar real-time performance. In response, the YOLO series, particularly the YOLOv5s to YOLOv8s methods, has been developed by scholars to strike a balance between real-time processing and accuracy. Nevertheless, YOLOv8's precision can fall short in certain specific applications. To address this, we introduce a real-time object detection method called -RepYOLO, which is built upon the -RepConv structure. This method is designed to maintain consistent detection speeds while improving accuracy. We begin by crafting a backbone network named -EfficientRep, which utilizes a strategically designed network unit- -RepConv and -RepC2f module, to reparameterize and subsequently generate an efficient inference model. This model achieves superior performance by extracting detailed feature maps from images. Subsequently, we propose the enhanced -RepPANet and -RepAFPN as the model's detection neck, with the addition of the -RepC2f for optimized feature fusion, thus boosting the neck's functionality. Our innovation continues with the development of an advanced decoupled head for detection, where the -RepConv takes the place of the traditional conv, resulting in a marked increase in detection precision during the inference stage. Our proposed -RepYOLO method, when applied to distinct neck modules, -RepPANet and -RepAFPN, achieves mAP of 84.77%/85.65% on the PASCAL VOC07+12 dataset and AP of 45.3%/45.8% on the MSCOCO dataset, respectively. These figures represent a significant advancement over the YOLOv8s method. Additionally, the model parameters for -RepYOLO are reduced to 10.8M/8.8M, which is 3.6%/21.4% less than that of YOLOv8, culminating in a more streamlined detection model. The detection speeds clocked on an RTX3060 are 116 FPS/81 FPS, showcasing a substantial enhancement in comparison to YOLOv8s. In summary, our approach delivers competitive performance and presents a more lightweight alternative to the SOTA YOLO models, making it a robust choice for real-time object detection applications.

YOLO-Former: YOLO Shakes Hand With ViT

An Object Detection Method Based on Improved YOLOX

YOLO9000: Better, Faster, Stronger

YOLOPose V2: Understanding and Improving Transformer-based 6D Pose Estimation

YOLOv10: Real-Time End-to-End Object Detection

YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness

YOLO-SDH: improved YOLOv5 using scaled decoupled head for object detection

PP-YOLOE: An evolved version of YOLO

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

-repyolo: real-time object detection method based on -RepConv and YOLOv8

YOLOv3: An Incremental Improvement

YOLO-World: Real-Time Open-Vocabulary Object Detection

YOLO_SRv2: An evolved version of YOLO_SR

M2YOLOF: Based on effective receptive fields and multiple-in-single-out encoder for object detection

SF-YOLOv5: Improved YOLOv5 with swin transformer and fusion-concat method for multi-UAV detection

YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications

YOLO-B:An infrared target detection algorithm based on bi-fusion and efficient decoupled

YOLO-Extract: Improved YOLOv5 for Aircraft Object Detection in Remote Sensing Images

TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios

R-YOLO: A YOLO-Based Method for Arbitrary-Oriented Target Detection in High-Resolution Remote Sensing Images

Dq-YOLOF: An Effective Improvement with Deformable Convolution and Sample Quality Optimization Based on the YOLOF Detector