Advance One-Shot Multispectral Instance Detection With Text's Supervision

Chen Feng,Jian Cheng,Yang Xiao,Zhiguo Cao
DOI: https://doi.org/10.1109/lsp.2024.3411516
2024-06-21
IEEE Signal Processing Letters
Abstract:One key issue within one-shot multispectral instance detection (OMID) is to extract features of strong instance discriminative power, domain adaptation capability, and instance-wise generality. Existing methods generally only rely on visual clues. Comparatively, text is advantageous due to its structured information, high semantics, and low noise. Inspired by recent emergence of large image-text datasets and breakthrough visual-language models, we propose to advance OMID with text's supervision for the first time. To this end, our key idea is to establish the relationship between one-shot multispectral instance with ImageNet class labels via the CLIP model. Particularly, we retrieve, rank, and ensemble the text features of ImageNet labels via instance image feature as query. Then the resulting instance image and text features are realigned and fused to obtain a multimodal feature. Meanwhile, a multispectral contrastive learning approach is proposed to drive multimodal feature learning for OMID. Note that all the procedures are end-to-end trained in a unified network. In this way, the instance discriminative power and domain adaptation capability are facilitated simultaneously. Experiments on two tailored multispectral instance detection datasets verify the effectiveness of our method.
engineering, electrical & electronic
What problem does this paper attempt to address?