Context-aware and Semantic-consistent Spatial Interactions for One-shot Object Detection without Fine-tuning

Hanqing Yang,Sijia Cai,Bing Deng,Jieping Ye,Guosheng Lin,Yu Zhang
DOI: https://doi.org/10.1109/tcsvt.2023.3349007
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:One-shot object detection (OSOD) without fine-tuning has recently garnered considerable attention and research focus. It aims to directly detect novel-class objects in the target image by providing merely one support image patch without undergoing the fine-tuning stage. However, most existing methods adopt image pair matching regardless of the scale inconsistency and spatial semantic mismatch of image pairs, which limits their ability to acquire high-quality target-support related features. This paper addresses these limitations by incorporating cross-scale contexts and semantic-consistent cues that are robust against the challenges of scarce and ambiguous matching. Specifically, we first introduce a simple yet effective Aggregation-Transformer-based Pyramid (ATP) module to explore the long-range cross-scale spatial interactions by employing the customized size-aware aggregation approach and the vanilla transformer encoder, thus the coarse-to-fine local image patterns are optimally utilized. Furthermore, we formulate the 4D contrastive cross-correlation tensor for instance-level features matching and suggest a Geometric Consistent Correlation (GCC) module that utilizes the bidirectional spatial-aware convolutions to extract the long-range semantic correspondences for target-support pairs. Additionally, a Channel Contrastive Learning (CCL) branch is adopted to complement the inter-channel interactions between target-support pairs for the GCC module. Extensive experiments demonstrate that our approach significantly outperforms the previous state-of-the-art methods by 6.5% and 2.1% on PASCAL VOC and COCO datasets for unseen classes, respectively.
engineering, electrical & electronic
What problem does this paper attempt to address?