Transformer-Based Few-Shot Object Detection with Multi-Relation Matching for Remote Sensing Images

Lefan Wang,Jiawei Lian,Yan Feng,Xiaoning Chen,Shaohui Mei
DOI: https://doi.org/10.1109/igarss53475.2024.10642409
2024-01-01
Abstract:Few-shot object detection (FSOD) on remote sensing images (RSIs) has garnered significant research interest due to its ability to detect novel classes using very few training examples from challenging remote sensing scenarios. Meta-learning FSOD methods, based on Faster R-CNN and YOLO structures, utilize a two-branch Siamese network as the backbone and compute the similarity between image regions for effective detection. However, almost all methods rely on extracting features using convolutional neural networks (CNNs). Inspired by the improved performance of transformer backbones for downstream tasks, a transformer-based FSOD method is proposed, which employs a transformer backbone with asymmetric-batched cross-attention for the two-branch feature extraction. Our model can improve the classification performance by introducing a Multi-Relation Matching (MRM) head for FSOD to enhance the similarity relation matching learning between two branches. Comprehensive experiments on DIOR benchmarks demonstrate the effectiveness of our model.
What problem does this paper attempt to address?