Differential Image-Based Scalable YOLOv7-Tiny Implementation for Clustered Embedded Systems

Sunghoon Hong,Daejin Park
DOI: https://doi.org/10.1109/tits.2024.3419095
IF: 8.5
2024-11-05
IEEE Transactions on Intelligent Transportation Systems
Abstract:Convolutional neural networks (CNNs) for powerful visual image analysis are gaining popularity in artificial intelligence. The main difference in CNNs compared to other artificial neural networks is that many convolutional layers are added, which improve the performance of visual image analysis by extracting the feature maps required for image classification. However, algorithm optimization is required to run applications that require low-latency in edge compute modules with limited processing resources. In this paper, we propose a novel algorithm optimization method for fast CNNs by using continuous differential images. The main idea is to reduce computation variably by using the differential value of the input in each convolutional layer. Also, the proposed method is compatible with all types of CNNs, and the performance is better when the pixel value difference of continuous images is low. We use the DarkNet framework to evaluate our algorithm using fast convolution and half convolution approaches on a clustered system. As a result, when the input frame rate is 10 fps, FLOPs are reduced by about 4.92 times compared to the original YOLOv7-tiny. By reducing the FLOPs of the convolutional layer, the inference speed increases to about 4.86 FPS, performing 1.57 times faster than the original YOLOv7-tiny. In the case of parallel processing that used two edge compute modules for using half convolution approach, FLOPs reduced more, and the response speed improved. In addition, faster Object detection implementation is possible by additionally expanding up to 7 compute modules in a scalable clustered embedded system as much as the user wants.
engineering, electrical & electronic,transportation science & technology, civil
What problem does this paper attempt to address?