CRTED: Few-Shot Object Detection via Correlation-RPN and Transformer Encoder–Decoder

Jinlong Chen,Kejian Xu,Yi Ning,Lianyuan Jiang,Zhi Xu
DOI: https://doi.org/10.3390/electronics13101856
IF: 2.9
2024-05-10
Electronics
Abstract:Few-shot object detection (FSOD) aims to address the challenge of requiring a substantial number of annotations for training in conventional object detection, which is very labor-intensive. However, the existing few-shot methods achieve high precision with the sacrifice of time for exhaustive fine-tuning or have poor performance in novel-class adaptation. We presume the major reason is that the valuable correlation feature among different categories is insufficiently exploited, hindering the generalization of knowledge from base to novel categories for object detection. In this paper, we propose few-shot object detection via Correlation-RPN and transformer encoder–decoder (CRTED), a novel training network to learn object-relevant features of inter-class correlation and intra-class compactness while suppressing object-agnostic features in the background with limited annotated samples. And we also introduce a four-way tuple-contrast training strategy to positively activate the training progress of our object detector. Experiments over two few-shot benchmarks (Pascal VOC, MS-COCO) demonstrate that our proposed CRTED without further fine-tuning can achieve comparable performance with current state-of-the-art fine-tuned works. The codes and pre-trained models will be released.
engineering, electrical & electronic,physics, applied,computer science, information systems
What problem does this paper attempt to address?
The paper aims to address several key issues in Few-Shot Object Detection (FSOD): 1. **Reducing the need for annotated data**: Traditional object detection methods usually require a large amount of annotated data for training, which is not only time-consuming but also costly. The goal of few-shot object detection is to achieve effective object detection with only a small amount of annotated data. 2. **Improving adaptability to new categories**: Existing few-shot methods, although capable of achieving high accuracy in some cases, often require extensive fine-tuning or perform poorly in adapting to new categories. The method proposed in this paper aims to improve the model's generalization ability to new categories by fully utilizing the relevant features between different categories. 3. **Optimizing the Region Proposal Network (RPN)**: Existing RPNs in few-shot detection scenarios often generate low-quality region proposals due to limited support image information, making it difficult to effectively distinguish between target objects and the background. This paper proposes a Correlation-RPN to address this issue. 4. **Introducing a quadruplet contrastive training strategy**: To further activate the training process of the model, this paper introduces a quadruplet contrastive training strategy, which enhances the model's performance by positively activating the training progress. Specifically, the main contributions of this paper include: - Proposing a new correlation-aware region proposal network structure (Correlation-RPN), which improves the detector's performance in object localization and generalization ability. - Redesigning a new feature encoding mechanism and integrating the Transformer encoder-decoder structure into the model to effectively learn support-query feature similarity representations. - Through the proposed quadruplet contrastive training strategy, CRTED can achieve performance comparable to most representative methods without further fine-tuning. These innovations collectively address the key challenges in few-shot object detection, providing new ideas and methods for research in this field.