M-RRFS: A Memory-Based Robust Region Feature Synthesizer for Zero-Shot Object Detection
Peiliang Huang,Dingwen Zhang,De Cheng,Longfei Han,Pengfei Zhu,Junwei Han
DOI: https://doi.org/10.1007/s11263-024-02112-9
IF: 13.369
2024-05-24
International Journal of Computer Vision
Abstract:With the goal to detect both the object categories appearing in the training phase and those never have been observed before testing, zero-shot object detection (ZSD) becomes a challenging yet anticipated task in the community. Current approaches tackle this problem by drawing on the feature synthesis techniques used in the zero-shot image classification (ZSC) task without delving into the inherent problems of ZSD. In this paper, we analyze the out-standing challenges that ZSD presents compared with ZSC—severe intra-class variation, complex category co-occurrence, open test scenario, and reveal their interference to the region feature synthesis process. In view of this, we propose a novel memory-based robust region feature synthesizer (M-RRFS) for ZSD, which is equipped with the Intra-class Semantic Diverging (IntraSD), the Inter-class Structure Preserving (InterSP), and the Cross-Domain Contrast Enhancing (CrossCE) mechanisms to overcome the inadequate intra-class diversity, insufficient inter-class separability, and weak inter-domain contrast problems. Moreover, when designing the whole learning framework, we develop an asynchronous memory container (AMC) to explore the cross-domain relationship between the seen class domain and unseen class domain to reduce the overlap between the distributions of them. Based on AMC, a memory-assisted ZSD inference process is also proposed to further boost the prediction accuracy. To evaluate the proposed approach, comprehensive experiments on MS-COCO, PASCAL VOC, ILSVRC and DIOR datasets are conducted, and superior performances have been achieved. Notably, we achieve new state-of-the-art performances on MS-COCO dataset, i.e., 64.0 , 60.9 and 55.5 Recall@100 with IoU respectively, and 15.1 mAp with IoU , under the 48/17 category split setting. Meanwhile, experiments on the DIOR dataset actually build the earliest benchmark for evaluating zero-shot object detection performance on remote sensing images. https://github.com/HPL123/M-RRFS.
computer science, artificial intelligence