CLIP-Guided Source-Free Object Detection in Aerial Images

Nanqing Liu,Xun Xu,Yongyi Su,Chengxin Liu,Peiliang Gong,Heng-Chao Li
2024-05-30
Abstract:Domain adaptation is crucial in aerial imagery, as the visual representation of these images can significantly vary based on factors such as geographic location, time, and weather conditions. Additionally, high-resolution aerial images often require substantial storage space and may not be readily accessible to the public. To address these challenges, we propose a novel Source-Free Object Detection (SFOD) method. Specifically, our approach begins with a self-training framework, which significantly enhances the performance of baseline methods. To alleviate the noisy labels in self-training, we utilize Contrastive Language-Image Pre-training (CLIP) to guide the generation of pseudo-labels, termed CLIP-guided Aggregation (CGA). By leveraging CLIP's zero-shot classification capability, we aggregate its scores with the original predicted bounding boxes, enabling us to obtain refined scores for the pseudo-labels. To validate the effectiveness of our method, we constructed two new datasets from different domains based on the DIOR dataset, named DIOR-C and DIOR-Cloudy. Experimental results demonstrate that our method outperforms other comparative algorithms. The code is available at <a class="link-external link-https" href="https://github.com/Lans1ng/SFOD-RS" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?