Content-guided and Class-oriented Learning for VHR Image Semantic Segmentation

Fang Liu,Keming Liu,Jia Liu,Jingxiang Yang,Xu Tang,Liang Xiao
DOI: https://doi.org/10.1109/tgrs.2024.3460081
2024-01-01
Abstract:With the flourishing of remote sensing (RS) platform techniques, very high-resolution (VHR) images have become more and more popular in recent years, which benefit the task of semantic segmentation but bring new challenges as well. Small objects, such as cars and trees, only occupy a few pixels in VHR images and are usually hard to segment. Moreover, the overlap problem about similar ground objects, such as low vegetation and trees, always results in underperformance. In this article, a content-guided and class-oriented network (CGCO-Net) for VHR image semantic segmentation is proposed to tackle this problem. Specifically, an adaptive content-guided fusion (ACGF) module with deformable convolution is introduced to capture long-distance dependencies and spatial aggregation effectively. With the guidance of the high-level features, the semantic content knowledge is gradually aggregated into low-level features and the details of the original features could be preserved. In addition, a multiscale channel alignment module is introduced into the encoder-decoder structure to further extract the long-range context information and reduce the calculation consumption. In order to improve the ability of pixel-level classification, a class-oriented representation learning (CORL) way is designed with transformer blocks by class embedding and deep supervision, which gradually enhance the discrimination and benefit the final segmentation. Furthermore, a weighted loss function and a threshold optimization strategy are employed to alleviate the sample imbalance problem. Tested on three public datasets and compared with several state-of-the-art methods, the proposed CGCO-net achieves good performance in both qualitative and quantitative analysis.
What problem does this paper attempt to address?