Progressive Learning Vision Transformer for Open Set Recognition of Fine-Grained Objects in Remote Sensing Images.

Yimin Fu,Zhunga Liu,Zuowei Zhang
DOI: https://doi.org/10.1109/tgrs.2023.3309091
IF: 8.2
2023-01-01
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Open set recognition (OSR) aims to classify known classes and recognize unknown classes simultaneously. Existing OSR methods have primarily focused on learning decision boundaries based on overall feature representations, and have achieved good performance on various coarse-grained image datasets. However, the overall feature representations of objects in fine-grained image datasets are highly similar, making it difficult to distinguish between known and unknown classes by overall feature-based decision boundaries. To address this problem, we propose a progressive learning vision transformer (PLViT) with a coarse-to-fine optimization strategy. In PLViT, the overall feature representations are first optimized in the distance space to learn the initial decision boundaries. Then, a context-aware patch selection module is designed to locate the discriminative part regions. Afterward, the multilayer representations of each selected patch are aggregated according to the self-attention weights, and input into the last transformer layer to extract local feature representations. Finally, overall and local feature representations are adaptively fused and optimized in the angular space to further refine the decision boundaries. Experimental results on four fine-grained remote sensing object recognition datasets show that PLViT outperforms state-of-the-art methods.
What problem does this paper attempt to address?