Few-shot Object Detection in Remote Sensing: Lifting the Curse of Incompletely Annotated Novel Objects

Fahong Zhang,Yilei Shi,Zhitong Xiong,Xiao Xiang Zhu
2023-09-19
Abstract:Object detection is an essential and fundamental task in computer vision and satellite image processing. Existing deep learning methods have achieved impressive performance thanks to the availability of large-scale annotated datasets. Yet, in real-world applications the availability of labels is limited. In this context, few-shot object detection (FSOD) has emerged as a promising direction, which aims at enabling the model to detect novel objects with only few of them annotated. However, many existing FSOD algorithms overlook a critical issue: when an input image contains multiple novel objects and only a subset of them are annotated, the unlabeled objects will be considered as background during training. This can cause confusions and severely impact the model's ability to recall novel objects. To address this issue, we propose a self-training-based FSOD (ST-FSOD) approach, which incorporates the self-training mechanism into the few-shot fine-tuning process. ST-FSOD aims to enable the discovery of novel objects that are not annotated, and take them into account during training. On the one hand, we devise a two-branch region proposal networks (RPN) to separate the proposal extraction of base and novel objects, On another hand, we incorporate the student-teacher mechanism into RPN and the region of interest (RoI) head to include those highly confident yet unlabeled targets as pseudo labels. Experimental results demonstrate that our proposed method outperforms the state-of-the-art in various FSOD settings by a large margin. The codes will be publicly available at <a class="link-external link-https" href="https://github.com/zhu-xlab/ST-FSOD" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper mainly discusses the problem of few-shot object detection (FSOD) in remote sensing images. Although existing deep learning methods perform well on large-scale annotated datasets, the acquisition of labels is often limited in real-world applications. The paper points out that existing FSOD algorithms consider unlabeled objects as background when the input image contains multiple partially annotated new objects, which leads to confusion and severely affects the model's recall ability for new objects. To solve this problem, the paper proposes a self-training-based FSOD (ST-FSOD) method. ST-FSOD integrates the self-training mechanism into the few-shot fine-tuning process to discover and incorporate unlabeled new objects into training. The specific implementation includes two parts: self-training region proposal network (ST-RPN) and self-training bounding box head (ST-BBH). ST-RPN adopts a dual-branch structure to separate proposal extraction for the base and new objects, while ST-BBH utilizes a student-teacher mechanism to treat high-confidence but unlabeled targets as pseudo labels. Experimental results show that this method outperforms existing techniques in various FSOD settings. The paper also introduces related work, including object detection, few-shot learning, self-training, and FSOD in remote sensing images. The authors emphasize the importance of the problem of incompletely annotated new objects in remote sensing images, and point out that this is a challenge overlooked by existing methods. Through self-training, this method is able to identify and exclude potential unlabeled new objects, thereby improving the detection accuracy of the model.