A Strong Vision Transformer Adapter with Adaptive Thresholding for Fine-Grained Building Classification

Xiaoqiang Lu,Licheng Jiao,Qiong Liu,Lingling Li,Fang Liu,Xu Liu,Yuting Yang
DOI: https://doi.org/10.1109/igarss52108.2023.10281473
2023-01-01
Abstract:Fine-grained building classification provides a solid basis for the comparison of city morphologies and the investigation of urban planning. To this aim, the DFC23 establishes a large-scale and multi-modal benchmark for the classification of building roof types. However, the problems of long-tailed distribution, data insufficient, inter-class similarity, and intra-class difference severely inhibit the performance of the detector. In this work, we build a strong vision transformer adapter fine-tuned on the cropped building instances to enhance the capacity of feature extraction and design a cross-modal fusion (CMF) module to effectively aggregate features from RGB and SAR data. When transferring to building instance segmentation, we construct a robust training pipeline and a two-stage test-time results ensemble scheme. Furthermore, we introduce self-training with two key denoising techniques, global average filtering (GAF) and intra-class adaptive thresholding (IAT), to boost the generalization of the model. Experimental results show the effectiveness of our method, ranking 2nd in the test phase of the contest.
What problem does this paper attempt to address?