An effective method for small objects detection based on MDFFAM and LKSPP

Zhoutian Xu,Yadong Xu,Manyi Wang
DOI: https://doi.org/10.1038/s41598-024-60745-9
IF: 4.6
2024-05-05
Scientific Reports
Abstract:Object detection is one of the research hotspots in computer vision. However, most existing object detectors struggle with the identification of small targets. Therefore, the paper proposes two modules: the MDFFAM (Multi-Directional Feature Fusion Attention Mechanism) and the LKSPP (Large Kernel Spatial Pyramid Pooling), to enhance the detector's effectiveness in identifying subtle faults on the surface of mechanical equipment. LKSPP aims to expand the receptive field to capture high-level semantic features through large kernels. Meanwhile, the MDFFAM allows the network to efficiently utilize spatial location information and adaptively recognize detection priorities. In the detection task, MDFFAM effectively captures feature information in three spatial directions: width, height, and channel, with the location information fully utilized to establish stable long-range dependencies. Moreover, LKSPP boasts a larger receptive field and imposes less computational burden compared to the SPPCSPC by YOLOv7. Finally, experiments demonstrate that the proposed module effectively improves the detection accuracy for small targets, surpassing the state-of-the-art object detector, YOLOv7. Remarkably, MDFFAM incurs almost negligible computational overhead.
multidisciplinary sciences
What problem does this paper attempt to address?
This paper primarily addresses the challenge of small object recognition in mechanical surface defect detection. Most existing object detectors perform poorly when identifying smaller targets. To solve this problem, the authors propose two new modules: MDFFAM (Multi-Directional Feature Fusion Attention Mechanism) and LKSPP (Large Kernel Spatial Pyramid Pooling). - **MDFFAM**: This mechanism effectively utilizes spatial location information and adaptively identifies detection priorities. By capturing feature information in the width, height, and channel directions, it fully leverages positional information to establish stable long-range dependencies. - **LKSPP**: This module aims to expand the receptive field through large kernels to capture high-level semantic features. Compared to SPPCSPC in YOLOv7, LKSPP has a larger receptive field and a lighter computational burden. By combining these two modules, the paper proposes a new object detection model—Slim-YOLO, which achieves significant performance improvements in small object detection tasks, especially in mechanical surface defect detection. Compared to the advanced real-time object detector YOLOv7, Slim-YOLO improves detection accuracy while maintaining lower computational complexity. The experimental section uses the NEU-DET dataset, which contains 6 typical types of mechanical surface defects. By comparing with the YOLO series and other advanced models (such as the YOLOR series), the effectiveness and superiority of Slim-YOLO are verified. Specifically, Slim-YOLO not only leads in the mAP 50 metric compared to baseline models but also excels in terms of parameter count and computational cost, indicating that its core components, MDFFAM and LKSPP, achieve a good balance between accuracy and computational efficiency.