Zero-Shot Object Detection by Semantics-Aware DETR with Adaptive Contrastive Loss.

Huan Liu,Lu Zhang,Jihong Guan,Shuigeng Zhou
DOI: https://doi.org/10.1145/3581783.3612523
2023-01-01
Abstract:Zero-shot object detection (ZSD) aims to localize and recognize unseen objects in unconstrained images by leveraging semantic descriptions. Existing ZSD methods typically suffer from two drawbacks: 1) Due to the lack of data on unseen categories during the training phase, the model inevitably has a bias towards the seen categories, i.e., it prefers to subsume objects of unseen categories to seen categories; 2) It is usually very tricky for the feature extractor trained on data of seen categories to learn discriminative features that are good enough to help the model transfer the knowledge learned from data of seen categories to unseen categories. To tackle these problems, this paper proposes a novel zero-shot detection method based on a semantics-aware DETR and a class-wise adaptive contrastive loss. Concretely, to address the first problem, we develop a novel semantics-aware attention mechanism to mitigate the bias towards seen categories and integrate it into DETR, which results in a new end-to-end zero-shot object detection approach. Furthermore, to handle the second problem, a novel class-wise adaptive contrastive loss is proposed, which considers the relevance between each pair of categories according to their semantic description in order to learn separable features for better visual-semantic alignment. Extensive experiments and ablation studies on benchmark datasets demonstrate the effectiveness and superiority of the proposed method.
What problem does this paper attempt to address?