Abstract:With the development of CNNs and the application of transformers, the segmentation performance of high-resolution remote sensing image semantic segmentation models has been significantly improved. However, the issue of category imbalance in remote sensing images often leads to the model's segmentation ability being biased towards categories with more samples, resulting in suboptimal performance for categories with fewer samples. To make the network's learning and representation capabilities more balanced across different classes, in this paper we propose a category-based interactive attention and perception fusion network (CIAPNet), where the network divides the feature space by category to ensure the fairness of learning and representation for each category. Specifically, the category grouping attention (CGA) module utilizes self-attention to reconstruct the features of each category in a grouped manner, and optimize the foreground–background relationship and its feature representation for each category through the interactive foreground–background relationship optimization (IFBRO) module therein. Additionally, we introduce a detail-aware fusion (DAF) module, which uses shallow detail features to complete the semantic information of deep features. Finally, a multi-scale representation (MSR) module is deployed for each class in the CGA and DAF modules to enhance the description capability of different scale information for each category. Our proposed CIAPNet achieves mIoUs of 54.44%, 85.71%, and 87.88% on the LoveDA urban–rural dataset, and the International Society for Photogrammetry and Remote Sensing (ISPRS) Vaihingen and Potsdam urban datasets, respectively. Compared with current popular methods, our network not only achieves excellent performance but also demonstrates outstanding class balance.

MCAFNet: A Multiscale Channel Attention Fusion Network for Semantic Segmentation of Remote Sensing Images

MFCA-Net: a deep learning method for semantic segmentation of remote sensing images

MCAFNet: Multi-Channel Attention Fusion Network-Based CNN For Remote Sensing Scene Classification

A Crossmodal Multiscale Fusion Network for Semantic Segmentation of Remote Sensing Data

Enhanced Multi-Scale Feature Adaptive Fusion Sparse Convolutional Network for Large-Scale Scenes Semantic Segmentation

Multi-View Feature Fusion and Rich Information Refinement Network for Semantic Segmentation of Remote Sensing Images

Encoder- and Decoder-Based Networks Using Multiscale Feature Fusion and Nonlocal Block for Remote Sensing Image Semantic Segmentation

An Attention-Fused Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery

CTMFNet: CNN and Transformer Multiscale Fusion Network of Remote Sensing Urban Scene Imagery

A Transformer-based Multi-Modal Fusion Network for Semantic Segmentation of High-Resolution Remote Sensing Imagery

Unsupervised Multi-Scale Hybrid Feature Extraction Network for Semantic Segmentation of High-Resolution Remote Sensing Images

Transformer and CNN Hybrid Deep Neural Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery

Category attention guided network for semantic segmentation of Fine-Resolution remote sensing images

TCNet: Multiscale Fusion of Transformer and CNN for Semantic Segmentation of Remote Sensing Images

Multi-scale attention fusion network for semantic segmentation of remote sensing images

Category-Based Interactive Attention and Perception Fusion Network for Semantic Segmentation of Remote Sensing Images

MCFNet: Multi-scale Covariance Feature Fusion Network for Real-time Semantic Segmentation

MCAT-UNet: Convolutional and Cross-Shaped Window Attention Enhanced UNet for Efficient High-Resolution Remote Sensing Image Segmentation

Multi-Field Context Fusion Network for Semantic Segmentation of High-Spatial-Resolution Remote Sensing Images

Scale-Aware Neural Network for Semantic Segmentation of Multi-Resolution Remote Sensing Images

Multiscale Fusion Convolutional Network in Real-time Semantic Segmentation