Mining Contrastive Relations Between Cross-Modal Features for Zero-Shot Remote Sensing Image Scene Classification

Chun Liu,Suqiang Ma,Zheng Li,Wei Yang,Zhigang Han
DOI: https://doi.org/10.1109/lgrs.2024.3368344
IF: 5.343
2024-03-09
IEEE Geoscience and Remote Sensing Letters
Abstract:The task of zero-shot classification of image scenes is to recognize the image scenes that are not seen in the training stage. To address the zero-shot image scene classification problem, the cross-modal feature alignment methods have been proposed in recent years. These methods mainly focus on matching the visual features of each image scene with their corresponding semantic descriptors in the latent space. Less attention has been paid to the contrastive relationships between different image scenes and different semantic descriptors. In this work, we propose a multilevel feature alignment method by mining contrastive relations between cross-modal features for zero-shot classification of remote-sensing image scenes. While promoting the single-instance level positive alignment between each image scene with their corresponding semantic descriptors, the proposed method learns to keep the visual and semantic features of different classes in the latent space apart from each other. Extensive experiments have shown that the proposed method has better performance for zero-shot remote sensing image scene classification. All the code and data are available at github https://github.com/masuqiang/MCFA-Pytorch.
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics
What problem does this paper attempt to address?