Inception‐YOLO: Computational cost and accuracy improvement of the YOLOv5 model based on employing modified CSP, SPPF, and inception modules

Hadi Khodaei Jooshin,Mahdi Nangir,Hadi Seyedarabi
DOI: https://doi.org/10.1049/ipr2.13077
IF: 2.3
2024-03-13
IET Image Processing
Abstract:Inception‐YOLO demands less FLOPs for object detection task compared to models with equal accuracy in real time. Inception‐YOLO demands less parameters for object detection task compared to models with equal accuracy in real time. The demand for less complex and more accurate architectures has always been a priority since the broad usage of computer vision in everyday life, like auto‐drive cars, portable applications, augmented reality systems, medical image analysis etc. There are a lot of methods that have been developed to improve the accuracy and complexity of object detection, like the generations of R‐CNNs and YOLOs. However, these methods are not the most efficient architectures, and there is always room to improve. In this study, the 5th version of YOLO is employed and the improved architecture, Inception‐YOLO, is presented. The model significantly outperforms the SOTA YOLO family. Specifically, the improvements can be summarised as follows: impressive improvement of floating point operations (FLOPs) and number of parameters, as well as improvement in accuracy compared to the models with fewer FLOPs. All our presented approaches, like the optimized inception module, proposed structures for CSP and SPPF, and the improved loss function used in this research, work together to incrementally improve detection results, accuracy, demanded memory, and FLOPs simultaneously. For a glimpse of performance, the Inception‐YOLO‐S model hits 38.7% AP with 5.9M parameters and 11.5 BFLOPs and outperforms YOLOv5‐S with 37.4% AP, 7.2M parameters, and 16.5 BFLOPs.
computer science, artificial intelligence,engineering, electrical & electronic,imaging science & photographic technology
What problem does this paper attempt to address?