WEA-DINO: An Improved DINO With Word Embedding Alignment for Remote Scene Zero-Shot Object Detection

Guangbiao Wang,Hongbo Zhao,Qing Chang,Shuchang Lyu,Guangliang Cheng,Huojin Chen
DOI: https://doi.org/10.1109/lgrs.2024.3408875
IF: 5.343
2024-06-15
IEEE Geoscience and Remote Sensing Letters
Abstract:Remote sensing scene zero-shot object detection (ZSD) aims to detect and recognize both seen and unseen categories of landscape elements with the guidance of the word embeddings. In this task, two primary challenges are identified. First, there exists considerable variability within categories of landscape elements, causing a misalignment between visual features and word embeddings, particularly noticeable for unseen categories. Second, the existing detection models struggle to provide accurate localization predictions, greatly impacting overall performance. To address these two issues, we propose word embedding alignment-DINO (WEA-DINO). Based on the original DINO structure, our WEA-DINO-Head is specifically designed to align the hidden features of "matching queries" with word embedding features, effectively addressing the misalignment issue between visual features and word embeddings. Furthermore, aligning the hidden features of "denoising queries" with word embedding features enables the translation of localization capabilities from known categories to previously unseen ones. Through extensive experimentation on the DIOR benchmark dataset, our method demonstrates state-of-the-art (SOTA) performance. The code is available at https://github.com/cv516Buaa/WEA-DINO.
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics
What problem does this paper attempt to address?