Abstract:In this article, we attempt to achieve one-shot object detection by mimicking the human ability to learn new concepts under limited reference, which aims at detecting all object instances of an unseen class in a target image when given a query image of the same unseen class. However, this one-shot learning ability of human benefits from the fact that human brain can quickly extract and process the associated information between the query–target images, which is an issue for the one-shot object detection framework to overcome. Moreover, the feature extraction of the query class in target images is intractable due to the complex and diversified background of remote sensing images. To solve these issues, we propose a solo-to-collaborative dual-attention network (SCoDANet) to hierarchically (image itself/pairs) enhance image feature representations. It consists of three components: 1) solo-attention head that strengthens the compactness of intraclass feature representations of an image and avoids background interference by selectively aggregating the similar features from the spatial and channel dimensions, respectively; 2) dual coattention module that guides RPN to generate an expected set of region proposals related to the query class by mining the coinformation of each query–target feature pair; and 3) nonlinear matching that provides a measure of similarity between the query feature and proposals of the target image to further learn a more robust detector. Our extensive experiments over two benchmarks demonstrate the effectiveness of our method under the one-shot scenario of detecting seen and unseen object categories.

Self-Attentive Networks for One-Shot Image Recognition

Multi-Attention Network For One Shot Learning

Solo-to-Collaborative Dual-Attention Network for One-Shot Object Detection in Remote Sensing Images

Exploring Self-Attention for Image Recognition

Memory Matching Networks for One-Shot Image Recognition

Self-attention network for few-shot learning based on nearest-neighbor algorithm

Few-Shot Image Classification Based on Asymmetric Convolution and Attention Mechanism

Multi-instance attention network for few-shot learning

Alignment Based Matching Networks for One-Shot Classification and Open-Set Recognition

Two-Branch Attention Network via Efficient Semantic Coupling for One-Shot Learning

Selectively Augmented Attention Network for Few-Shot Image Classification

Attend and Guide (AG-Net): A Keypoints-driven Attention-based Deep Network for Image Recognition

Self-Attention Relation Network for Few-Shot Learning

Multi-Pretext Attention Network for Few-Shot Learning with Self-Supervision

Multi-scale Self-similarity Network for Few-Shot Segmentation

SSNet: Learning Self-Similarity for Few-Shot Semantic Segmentation.

AMN: Attention Metric Network for One-Shot Remote Sensing Image Scene Classification.

Attributes-Guided and Pure-Visual Attention Alignment for Few-Shot Recognition

Learning to focus: cascaded feature matching network for few-shot image recognition

Spatial Attention Network for Few-Shot Learning

Attribute- and attention-guided few-shot classification