A novel finetuned YOLOv6 transfer learning model for real-time object detection

Chhaya Gupta,Nasib Singh Gill,Preeti Gulia,Jyotir Moy Chatterjee
DOI: https://doi.org/10.1007/s11554-023-01299-3
IF: 2.293
2023-04-12
Journal of Real-Time Image Processing
Abstract:Object detection and object recognition are the most important applications of computer vision. To pursue the task of object detection efficiently, a model with higher detection accuracy is required. Increasing the detection accuracy of the model increases the model's size and computation cost. Therefore, it becomes a challenge to use deep learning in embedded environments. To overcome this problem, the current research suggests a transfer-learning-based model for real-time object detection that enhances the YOLO algorithm's effectiveness. The model utilizes YOLOv6 as a baseline model. This study proposes a pruning and finetuning algorithm as well as a transfer learning algorithm for enhancing the proposed model's efficiency in terms of detection accuracy and inference speed. This paper also focuses on how the proposed model will be able to identify all objects (indoor as well as outdoor) in a scene and provides a voice output to warn the user about nearby and faraway objects. To receive the audio feedback, Google Text-to-Speech (gTTs) library is used. The model is trained on the MS-COCO dataset. The proposed model is compared with the Tensorflow Single Shot Detector model, Faster RCNN model, Mask RCNN model, YOLOv4, and baseline YOLOv6 model. After pruning the YOLOv6 baseline model by 30%, 40%, and 50%, the finetuned YOLOv6 framework hits 37.8% higher average precision (AP) with 1235 frames per second (FPS).
computer science, artificial intelligence,engineering, electrical & electronic,imaging science & photographic technology
What problem does this paper attempt to address?