A Swin transformer-functionalized lightweight YOLOv5s for real-time coal–gangue detection

DOI: https://doi.org/10.1007/s11554-023-01305-8
IF: 2.293
2023-04-25
Journal of Real-Time Image Processing
Abstract:Despite various proposed algorithms predicated upon convolution neural networks to deal with coal–gangue detection under complex production, applying Transformer into the coal–gangue detection network has been rarely executed so far. Here, a lightweight CNN- and Transformer-based coal–gangue detection network is instituted via introducing Swin Transformer blocks to promote feature fusion and achieve accurate position and identification. Transformer enables interacting long-distance semantic information and including more semantic information into low-level features. The α -IoU loss is further leveraged to endow accurate regression of bounding box. Compared with the output heatmap by the original network, it is found that the modified network can accurately capture the area where the target is rather than the irrelevant background area. Images acquired in three illuminances served as test datasets (A 1 , A 2 , and A 3 ) to unearth model's illumination robustness. Outcomes denote that YOLOv5-Swin bears optimal illumination adaptability amid coal–gangue detection. Alongside pristine YOLOv5s, mAP of A 1 , A 2 , and A 3 jump by 2.53%, 2.4%, 2.84%, respectively, while detection velocity can run at 147 FPS, twice as fast as YOLOv3's velocity. This method meets the needs of real-time detection, which can accurately and quickly detect coal and gangue.
computer science, artificial intelligence,engineering, electrical & electronic,imaging science & photographic technology
What problem does this paper attempt to address?