I-YOLO: a novel single-stage framework for small object detection

Kang Tong,Yiquan Wu
DOI: https://doi.org/10.1007/s00371-024-03284-8
IF: 2.835
2024-02-20
The Visual Computer
Abstract:Small object detection is a challenging task in computer vision. We claim that the huge performance gap between the small object detectors and normal sized object detectors stems from two aspects, including the small object dataset and the small object itself. In terms of datasets, we build a large-scale dataset with high image resolution dubbed Small-PCB, in order to promote detection in semiconductor industry. For the small object itself, we utilize multi-scale feature learning and feature fusion strategy to help detect objects. More concretely, we devise two novel components to predict small objects better: re-parameterized module with channel shuffle (RMCS) and multi-scale feature enhanced convolution (MFEC). MFEC aims to split input channels into several parts and applies convolutions with different sizes to each part, and adopt point-by-point convolution to fuse individual channel features. RMCS not only use structural re-parameterization, but also channel shuffle. The usage of channel shuffle can be seen as a fusion of channel features. It strengthens feature information interaction between different channel groups, which bring more informative feature clues. Based on the RMCS and the MFEC, we introduce OIU-RMCS and M-MFEC, respectively. Finally, we build our I-YOLO via integrating these two components into a YOLO-based detector. A large number of qualitative and quantitative results in the experiments indicate that our proposed I-YOLO achieves the state-of-the-art performance on the popular AI-TODv2 and Small-PCB datasets.
computer science, software engineering
What problem does this paper attempt to address?