YOLOv1 to YOLOv10: The fastest and most accurate real-time object detection systems

Chien-Yao Wang,Hong-Yuan Mark Liao
2024-08-18
Abstract:This is a comprehensive review of the YOLO series of systems. Different from previous literature surveys, this review article re-examines the characteristics of the YOLO series from the latest technical point of view. At the same time, we also analyzed how the YOLO series continued to influence and promote real-time computer vision-related research and led to the subsequent development of computer vision and language models.We take a closer look at how the methods proposed by the YOLO series in the past ten years have affected the development of subsequent technologies and show the applications of YOLO in various fields. We hope this article can play a good guiding role in subsequent real-time computer vision development.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper primarily reviews the development history of the YOLO (You Only Look Once) series and its impact on the field of real-time computer vision. YOLO is an efficient real-time object detection system, evolving from YOLOv1 to YOLOv10, with each version improving and innovating upon the previous one. ### Main Issues Addressed 1. **Real-time Object Detection**: The YOLO series aims to enhance the speed of object detection, enabling true real-time processing in practical applications such as autonomous driving and industrial robots. 2. **Balance Between Detection Accuracy and Speed**: While ensuring real-time performance, the YOLO series continuously strives to improve detection accuracy, especially in the capability of detecting small objects. 3. **Multi-scale Detection Capability**: By introducing various techniques, such as Feature Pyramid Networks (FPN), the YOLO series can effectively detect objects of different sizes. 4. **Model Generalization and Adaptability**: The YOLO series is not only suitable for specific object detection tasks but is also widely applied in multiple computer vision fields such as instance segmentation and pose estimation, and can efficiently run on different hardware platforms. 5. **Model Scalability and Optimization**: With each version update, the YOLO series has introduced many advanced training techniques and architectural designs to enhance the model's generalization ability and performance. ### Specific Progress - **YOLOv1**: Proposed a unified one-stage detection method that directly predicts bounding boxes and class probabilities for each grid, greatly simplifying the object detection process. - **YOLOv2**: Introduced techniques such as dimension clustering, direct location prediction, and fine-grained feature extraction, further improving detection speed and accuracy. - **YOLOv3**: Combined multi-scale prediction with advanced residual network structures, significantly enhancing the ability to detect small objects. - **YOLOv4**: Integrated a series of "free" and "special" techniques, such as data augmentation and attention mechanisms, greatly improving detection performance. - **Subsequent Versions** (such as YOLOv5, YOLOv6, etc.): Continued to optimize model architecture, training strategies, and hardware friendliness to meet the needs of different application scenarios. In summary, this paper comprehensively reviews the development history and technical characteristics of the YOLO series, emphasizing its important role in advancing real-time computer vision research and development.