SGFNet: Structure-Guided Few-Shot Object Detection

Jingkai Ma,Shuang Bai
DOI: https://doi.org/10.1109/tcsvt.2024.3507863
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Few-shot object detection (FSOD) focuses on detecting objects of novel classes with only a small number of annotated samples. Due to the limited number of new class samples and the presence of intra-class variance, current FSOD methods struggle to acquire sufficient discriminative information to represent the corresponding class, thus restricting the performance of FSOD. To address this issue, we propose a Structure-Guided Few-shot object detection (SGFNet) method that utilizes the structural information of targets to provide richer discriminative information. Specifically, we first design a Multi-Frequency Structural Feature (MFSF) module, where the highly discriminative structural information of objects in images is extracted and used to enhance the discriminativeness of the features of the target. Based on the MFSF, we then propose a Saliency Information Enhancement (SIE) module that utilizes saliency information to enhance the object-related structural features while suppressing background interference. In addition, we present a novel Soft Cosine Classifier (SCC) based on soft cosine similarity to extract consistent discriminative information between the support and query features for distinguishing targets. Extensive experiments on PASCAL VOC and MS COCO demonstrate that our method significantly outperforms a strong baseline (up to 13.8%) and previous state-of-the-art methods (4.8% in average).
What problem does this paper attempt to address?