Abstract:Enlarging input images is a straightforward and effective approach to promote small object detection. However, simple image enlargement is significantly expensive on both computations and GPU memory. In fact, small objects are usually sparsely distributed and locally clustered. Therefore, massive feature extraction computations are wasted on the non-target background area of images. Recent works have tried to pick out target-containing regions using an extra network and perform conventional object detection, but the newly introduced computation limits their final performance. In this paper, we propose to reuse the detector's backbone to conduct feature-level object-seeking and patch-slicing, which can avoid redundant feature extraction and reduce the computation cost. Incorporating a sparse detection head, we are able to detect small objects on high-resolution inputs (e.g., 1080P or larger) for superior performance. The resulting Efficient Small Object Detection (ESOD) approach is a generic framework, which can be applied to both CNN- and ViT-based detectors to save the computation and GPU memory costs. Extensive experiments demonstrate the efficacy and efficiency of our method. In particular, our method consistently surpasses the SOTA detectors by a large margin (e.g., 8% gains on AP) on the representative VisDrone, UAVDT, and TinyPerson datasets. Code will be made public soon.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to solve the problem of small - object detection in high - resolution images. Specifically: 1. **Challenges in small - object detection**: - Small objects usually occupy fewer pixels in the image and lack sufficient visual information to highlight feature representations. - Simply enlarging the input image resolution can improve the performance of small - object detection, but it will lead to a significant increase in computational load and GPU memory consumption, which is not suitable for rapid detection in practical applications. 2. **Limitations of existing methods**: - Most of the computing resources are wasted on non - target background areas because small objects are usually sparsely distributed and locally clustered. - Some methods generate masks for target areas through additional networks, but these additional computations will significantly increase the overall cost. 3. **Solutions in the paper**: - A method named Efficient Small Object Detection (ESOD) is proposed. By reusing the backbone network of the detector for feature - level target search and slicing, redundant feature extraction is avoided and the computational cost is reduced. - Combined with a sparse detection head, it can efficiently detect small objects on high - resolution inputs (such as 1080P or higher). ### Specific methods 1. **ObjSeeker module**: - Inserted in the early stage of the detector to find areas that may contain targets. - Generate class - agnostic objectness masks through depth - wise separable convolution and standard 1×1 convolution layers. 2. **AdaSlicer module**: - Adaptively slice the feature map into small pieces according to the objectness masks and discard the non - target background areas. - Adopt a greedy strategy or a simplified algorithm to optimize the slicing process and reduce the risk of targets being truncated. 3. **SparseHead module**: - Directly apply sparse convolution on the aggregated feature blocks to further reduce the computational overhead of the detection head. ### Experimental results - Extensive experiments were carried out on three representative datasets: VisDrone, UA VDT and TinyPerson. - The experimental results show that the ESOD method significantly outperforms the existing state - of - the - art detectors while maintaining a lower computational cost and inference speed, for example, an 8% improvement in the AP metric. ### Summary This paper effectively solves the computational efficiency problem of small - object detection in high - resolution images by proposing the ESOD framework, providing a new solution for practical applications.

ESOD: Efficient Small Object Detection on High-Resolution Images