YOLOv3-MT: A YOLOv3 using multi-target tracking for vehicle visual detection

Kun Wang,Maozhen Liu
DOI: https://doi.org/10.1007/s10489-021-02491-3
IF: 5.3
2021-06-04
Applied Intelligence
Abstract:During automatic driving, the complex background and mutual occlusion between multiple targets hinder the correct judgment of the detector and miss detection. When a close-range target is captured again, the vehicle may not be able to respond in time and cause a fatal accident. Therefore, in the application of auxiliary systems, a model that can accurately identify partially occluded targets in complex backgrounds and perform short-term tracking and early warning of completely occluded objects is required. This paper proposes a method to improve detection accuracy while supporting real-time operations based on YOLOv3 and realize real-time warnings for those objects that are completely blocked. First, we obtain a more suitable prior frames setting through class-wise K-means clustering. To solve the problem that the maxpool operation of original CBAM easily introduces background noise, we proposed AS-CBAM(Adaptive Selection Convolutional Block Attention Module) and innovatively combined the HDC(Hybrid Dilated Convolution) to maximize the receptive field and fine-tune the characteristics. The 1×1 convolution operation is used to suppress the increase of the parameter amount. In this study, DIOU-NMS was used to replace traditional NMS. Besides, a tracking algorithm based on Kalman filtering and Hungarian matching is introduced to improve the system's ability to recognize occluded objects. Compared with the traditional YOLOv3, the proposed method can increase the mAP by 1.32% and 1.47% on KITTI and UA-DETRAC, respectively. Nevertheless, it shows a processing speed of 35.07FPS and a more significant improvement in accuracy (90.36% vs. 85.71%) on the Object-Mask, a dataset that focuses on occlusion conditions. Therefore, the proposed algorithm is more suitable for autonomous driving applications.
computer science, artificial intelligence
What problem does this paper attempt to address?