Semantic segmentation based on double pyramid network with improved global attention mechanism

Xianfeng Ou,Hanpu Wang,Guoyun Zhang,Wujing Li,Shuixiang Yu
DOI: https://doi.org/10.1007/s10489-023-04463-1
IF: 5.3
2023-02-16
Applied Intelligence
Abstract:Scene semantic segmentation is an important and challenging task, which requires labeling the category of each pixel in the image accurately. The encoder-decoder framework represented by fully convolutional network(FCN) has unique advantages in semantic segmentation. However, it is still hard to segment the small target and object boundary in the FCN framework. So, this paper proposes a global attention double pyramid network(GADPNet) based on an improved global attention mechanism to improve the performance of semantic segmentation. It is composed of deep convolutional neural networks Resnet-101, atrous spatial pyramid pooling(ASPP) module, proposed pyramid decoder structure and improved global attention module. Resnet-101 is the backbone which is used to extract different stages' features. ASPP module is used to capture multi-scale features from a high-level feature branch. Pyramid decoder structure can take advantage of multi-scale features from ASPP module and different stages' low-level multi-scale feature maps guided by improved global attention module. The proposed decoder enhances the ability to capture multi-scale features. GADPNet is an end-to-end network. The experimental results of the value of mIoU on Pascal VOC 2012 test dataset and cityscapes val dataset are 80.5% and 72.9%, which indicate that the proposed GADPNet obtains higher semantic segmentation accuracy compared with the current methods.
computer science, artificial intelligence
What problem does this paper attempt to address?