STF-YOLO: A Small Target Detection Algorithm for UAV Remote Sensing Images Based on Improved SwinTransformer and Class Weighted Classification Decoupling Head

Yanming Hui,Jue Wang,Bo Li
DOI: https://doi.org/10.1016/j.measurement.2023.113936
IF: 5.6
2023-01-01
Measurement
Abstract:Due to the high altitude, high imaging resolution, complex background, and tilted shooting angle of the UAV images, detecting small targets poses challenges. In recent years, numerous deep learning-based methods have achieved good results in feature extraction and localization classification, but they fail to meet the needs of faster and more accurate recognition of small objects. This paper proposes a novel UAV remote sensing image small target detection algorithm called STF-YOLO to address the aforementioned issues. Firstly, we introduce the innovative SwinTransformer, a popular technique in the field of NLP, and combine it with CNNs to propose a novel convolutional structure called STRCN. Furthermore, this paper presents a novel lightweight classifier called CNeB, which is innovatively designed and incorporated into the Backbone part. CNeB exhibits advantages such as high recognition accuracy and parameter efficiency. During the process, we further enhance the prediction accuracy by incorporating a parameter-free attention mechanism called SimAM to improve the overall precision of the model. Moreover, we propose a novel detection head called CWDHead, which strengthens the weighting of classification capabilities and significantly improves the recognition accuracy of similar inter-class and intra-class instances. Lastly, we design a novel data augmentation method called SVM, which enriches the training dataset and enhances the robustness and generalization ability of the model. The results indicate that by applying enhancements such as WBF and TTA, our proposed algorithm, STF-YOLO, exhibits significant improvements over current state-of-the-art methods on the publicly available VisDrone dataset. It demonstrates noticeable advancements in widely recognized metrics such as 3.9%mAP, and 2.0%AP50.
What problem does this paper attempt to address?