-repyolo: real-time object detection method based on -RepConv and YOLOv8

Shuai Feng,Huaming Qian,Huilin Wang,Wenna Wang
DOI: https://doi.org/10.1007/s11554-024-01462-4
IF: 2.293
2024-05-09
Journal of Real-Time Image Processing
Abstract:Deep learning-based object detection methods often grapple with excessive model parameters, high complexity, and subpar real-time performance. In response, the YOLO series, particularly the YOLOv5s to YOLOv8s methods, has been developed by scholars to strike a balance between real-time processing and accuracy. Nevertheless, YOLOv8's precision can fall short in certain specific applications. To address this, we introduce a real-time object detection method called -RepYOLO, which is built upon the -RepConv structure. This method is designed to maintain consistent detection speeds while improving accuracy. We begin by crafting a backbone network named -EfficientRep, which utilizes a strategically designed network unit- -RepConv and -RepC2f module, to reparameterize and subsequently generate an efficient inference model. This model achieves superior performance by extracting detailed feature maps from images. Subsequently, we propose the enhanced -RepPANet and -RepAFPN as the model's detection neck, with the addition of the -RepC2f for optimized feature fusion, thus boosting the neck's functionality. Our innovation continues with the development of an advanced decoupled head for detection, where the -RepConv takes the place of the traditional conv, resulting in a marked increase in detection precision during the inference stage. Our proposed -RepYOLO method, when applied to distinct neck modules, -RepPANet and -RepAFPN, achieves mAP of 84.77%/85.65% on the PASCAL VOC07+12 dataset and AP of 45.3%/45.8% on the MSCOCO dataset, respectively. These figures represent a significant advancement over the YOLOv8s method. Additionally, the model parameters for -RepYOLO are reduced to 10.8M/8.8M, which is 3.6%/21.4% less than that of YOLOv8, culminating in a more streamlined detection model. The detection speeds clocked on an RTX3060 are 116 FPS/81 FPS, showcasing a substantial enhancement in comparison to YOLOv8s. In summary, our approach delivers competitive performance and presents a more lightweight alternative to the SOTA YOLO models, making it a robust choice for real-time object detection applications.
computer science, artificial intelligence,engineering, electrical & electronic,imaging science & photographic technology
What problem does this paper attempt to address?