LKASeg:Remote-Sensing Image Semantic Segmentation with Large Kernel Attention and Full-Scale Skip Connections

Xuezhi Xiang,Yibo Ning,Lei Zhang,Denis Ombati,Himaloy Himu,Xiantong Zhen
2024-10-14
Abstract:Semantic segmentation of remote sensing images is a fundamental task in geospatial research. However, widely used Convolutional Neural Networks (CNNs) and Transformers have notable drawbacks: CNNs may be limited by insufficient remote sensing modeling capability, while Transformers face challenges due to computational complexity. In this paper, we propose a remote-sensing image semantic segmentation network named LKASeg, which combines Large Kernel Attention(LSKA) and Full-Scale Skip Connections(FSC). Specifically, we propose a decoder based on Large Kernel Attention (LKA), which extract global features while avoiding the computational overhead of self-attention and providing channel adaptability. To achieve full-scale feature learning and fusion, we apply Full-Scale Skip Connections (FSC) between the encoder and decoder. We conducted experiments by combining the LKA-based decoder with FSC. On the ISPRS Vaihingen dataset, the mF1 and mIoU scores achieved 90.33% and 82.77%.
Computer Vision and Pattern Recognition,Artificial Intelligence,Image and Video Processing
What problem does this paper attempt to address?
The problems that this paper attempts to solve are several key challenges in semantic segmentation of remote sensing images: 1. **Limitations of traditional Convolutional Neural Networks (CNNs)**: Existing widely - used convolutional neural networks (such as FCN and UNet) have deficiencies when processing remote sensing images, mainly reflected in their limited ability to model complex objects. These models are prone to losing spatial and boundary information during the down - sampling process, resulting in inaccurate segmentation results. 2. **Computational complexity problem of Transformer**: Although Transformer can capture global information through the self - attention mechanism, its computational complexity is high. Especially when processing high - resolution remote sensing images, the computational and memory costs are very large. In addition, the self - attention mechanism mainly focuses on spatial adaptability and ignores channel adaptability. 3. **Scale variation and loss of spatial information**: The scale of objects in remote sensing images varies greatly, and there are many densely arranged similar objects, which makes accurate semantic segmentation difficult. Traditional CNN methods have limited ability to extract features at different scales and are prone to losing spatial detail information. To solve the above problems, the author proposes a new semantic segmentation network for remote sensing images - **LKASeg**. The main innovations of this network include: - **Introduction of a decoder based on Large Kernel Attention (LKA)**: LKA combines the advantages of convolution and self - attention, can extract global features at a lower computational cost, and provides channel adaptability. - **Application of Full - Scale Skip Connections (FSC)**: FSC realizes the learning and fusion of multi - scale features by establishing dense connections between the encoder and the decoder, thus solving the problems of scale variation and spatial information loss. Through these improvements, LKASeg retains the advantages of convolution in network structure design and overcomes the disadvantages of the self - attention mechanism, significantly improving the effect of semantic segmentation of remote sensing images. Experimental results show that on the ISPRS Vaihingen dataset, the mF1 and mIoU scores of LKASeg reach 90.33% and 82.77% respectively, which are significantly improved compared with the baseline model UNetformer. In summary, this paper aims to solve the problems of insufficient feature extraction, high computational complexity, and scale variation and spatial information loss in existing semantic segmentation methods for remote sensing images by introducing LKA and FSC, so as to achieve more accurate segmentation results.