DQ-DETR: DETR with Dynamic Query for Tiny Object Detection

Hou-I Liu,Yi-Xin Huang,Hong-Han Shuai,Wen-Huang Cheng
2024-09-23
Abstract:Despite previous DETR-like methods having performed successfully in generic object detection, tiny object detection is still a challenging task for them since the positional information of object queries is not customized for detecting tiny objects, whose scale is extraordinarily smaller than general objects. Also, DETR-like methods using a fixed number of queries make them unsuitable for aerial datasets, which only contain tiny objects, and the numbers of instances are imbalanced between different images. Thus, we present a simple yet effective model, named DQ-DETR, which consists of three different components: categorical counting module, counting-guided feature enhancement, and dynamic query selection to solve the above-mentioned problems. DQ-DETR uses the prediction and density maps from the categorical counting module to dynamically adjust the number of object queries and improve the positional information of queries. Our model DQ-DETR outperforms previous CNN-based and DETR-like methods, achieving state-of-the-art mAP 30.2% on the AI-TOD-V2 dataset, which mostly consists of tiny objects. Our code will be available at <a class="link-external link-https" href="https://github.com/Katie0723/DQ-DETR" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper primarily addresses the following issues: 1. **Tiny Object Detection**: Although existing DETR-like methods perform well in general object detection, they still face challenges in tiny object detection. Due to the much smaller size of tiny objects compared to general objects, their positional information has not been customarily processed. 2. **Fixed Number of Queries**: Existing DETR-like methods use a fixed number of queries, which makes them perform poorly in cases where the number of instances in different images is imbalanced. For example, some images may contain a large number of tiny objects, while others may contain fewer. To address the above issues, the authors propose the DQ-DETR model, which includes three main components: - **Categorical Counting Module**: Predicts the number of instances in an image and generates a density map. - **Counting-Guided Feature Enhancement**: Enhances the visual features of the encoder by combining the density map. - **Dynamic Query Selection**: Dynamically adjusts the number and positional information of queries based on the predicted number of instances. With these improvements, DQ-DETR achieves significant performance enhancement in tiny object detection tasks. Experimental results show that DQ-DETR achieves a mean Average Precision (mAP) of 30.2% on the AI-TOD-V2 dataset, surpassing other existing methods.