EDSD: Efficient Driving Scenes Detection Based on Swin Transformer

Wei Chen,Ruihan Zheng,Jiade Jiang,Zijian Tian,Fan Zhang,Yi Liu
DOI: https://doi.org/10.1007/s11042-024-19622-w
IF: 2.577
2024-01-01
Multimedia Tools and Applications
Abstract:In the field of autonomous driving, the detection of targets such as vehicles, bicycles, and pedestrians in complex road conditions is of great importance. Through extensive experimentation, we have found that various vehicle targets generally occupy large sizes in the image but are easily occluded, while small targets such as pedestrians usually appear densely. The detection of targets of different sizes is an important challenge for the performance of current detectors. To address this issue, we proposed a novel hierarchical feature pyramid network structure. This structure comprises a series of CNN-Transformer variant layers, each of which is a superposition of CST neural network modules and Swin Transformer modules. In addition, considering that the huge computation of the global self-attention mechanism is difficult to be applied in the field of autonomous driving, we adopted the shifted window method in SwinFM, which effectively accelerates the inference process by replacing the traditional method by using the self-attention mechanism within the window. This study uses the Swin Transformer as a baseline. Compared to the baseline, our EDSD model improves the average accuracy by 1.8
What problem does this paper attempt to address?