LSRFormer: Efficient Transformer Supply Convolutional Neural Networks With Global Information for Aerial Image Segmentation
Renhe Zhang,Qian Zhang,Guixu Zhang
DOI: https://doi.org/10.1109/tgrs.2024.3366709
IF: 8.2
2024-02-27
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Both local context information and global context information are essential for the semantic segmentation of aerial images. Convolutional neural networks (CNNs) can capture local context information well but cannot model the global dependencies. Vision transformers (ViTs) are good at extracting global information but cannot retain spatial details well. In order to leverage the advantages of these two paradigms, we integrate them into one model in this study. However, global token interaction of ViT brings high computational cost, which makes it difficult to apply to large-sized aerial images. To handle this problem, we propose a novel efficient ViT block named long-short-range transformer (LSRFormer). Instead of mainstream ViTs designed as backbones, LSRFormer is a pretraining-free and plug-and-play module to be appended after CNN stages to supplement the global information. It is composed of long-range self-attention (LR-SA), short-range self-attention (SR-SA), and multiscale-convolutional feed-forward network (MSC-FFN). LR-SA establishes long-range dependencies at the junction of the windows and SR-SA diffuses the long-range information from window boundary to internal. MSC-FFN can capture multiscale information inside the ViT block. We append the LSRFormer block after each CNN stage of a pure convolutional network to build a model named ConvLSR-Net. Compared with existing models which combine CNN and ViTs, our model can learn both local and global representations at all stages of the model. In particular, ConvLSR-Net achieves state-of-the-art (SOTA) results on four challenging aerial image segmentation benchmarks, including iSAID, LoveDA, ISPRS Potsdam, and Vaihingen. The code has been released at https://github.com/stdcoutzrh/ConvLSR-Net.
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics