Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector

Yuqian Fu,Yu Wang,Yixuan Pan,Lian Huai,Xingyu Qiu,Zeyu Shangguan,Tong Liu,Yanwei Fu,Luc Van Gool,Xingqun Jiang
2024-09-27
Abstract:This paper studies the challenging cross-domain few-shot object detection (CD-FSOD), aiming to develop an accurate object detector for novel domains with minimal labeled examples. While transformer-based open-set detectors, such as DE-ViT, show promise in traditional few-shot object detection, their generalization to CD-FSOD remains unclear: 1) can such open-set detection methods easily generalize to CD-FSOD? 2) If not, how can models be enhanced when facing huge domain gaps? To answer the first question, we employ measures including style, inter-class variance (ICV), and indefinable boundaries (IB) to understand the domain gap. Based on these measures, we establish a new benchmark named CD-FSOD to evaluate object detection methods, revealing that most of the current approaches fail to generalize across domains. Technically, we observe that the performance decline is associated with our proposed measures: style, ICV, and IB. Consequently, we propose several novel modules to address these issues. First, the learnable instance features align initial fixed instances with target categories, enhancing feature distinctiveness. Second, the instance reweighting module assigns higher importance to high-quality instances with slight IB. Third, the domain prompter encourages features resilient to different styles by synthesizing imaginary domains without altering semantic contents. These techniques collectively contribute to the development of the Cross-Domain Vision Transformer for CD-FSOD (CD-ViTO), significantly improving upon the base DE-ViT. Experimental results validate the efficacy of our model.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper investigates the challenges of Cross-Domain Few-Shot Object Detection (CD-FSOD), aiming to develop a model that can accurately detect objects in new domains using only a few annotated examples. Although Transformer-based open-set detectors (such as DE-ViT) perform well in traditional few-shot object detection, their generalization ability in CD-FSOD remains unclear. Specifically, the paper attempts to answer the following two questions: 1. **Can open-set detection methods easily generalize to CD-FSOD?** 2. **If not, how can the model be enhanced when facing significant domain gaps?** To answer the first question, the authors employed various metrics, including style, Inter-Class Variance (ICV), and Indefinable Boundary (IB), to understand the domain gap. Based on these metrics, they established a new benchmark dataset, CD-FSOD, to evaluate object detection methods, and found that most existing methods perform poorly in cross-domain tasks. To answer the second question, the authors proposed several novel modules to address these issues: - **Learnable Instance Features**: Enhancing the discriminative power of features by aligning the initially fixed instance features with the target categories. - **Instance Re-weighting Module**: Assigning higher weights to high-quality instances with slight IB, thereby mitigating the challenges posed by IB. - **Domain Prompting**: Enhancing the model's robustness to different styles by synthesizing virtual "domains" while maintaining semantic content consistency. These techniques collectively contribute to the development of a new Cross-Domain Vision Transformer (CD-ViTO), significantly improving the performance of the baseline DE-ViT. Experimental results validate the effectiveness of this model.