A Serial-Parallel Self-Attention Network Joint with Multi-Scale Dilated Convolution.

Wang Gaihua,Zhang Tianlun,Dai Yingying,Lin Jinheng,Cheng Lei
DOI: https://doi.org/10.1109/access.2021.3079243
IF: 3.9
2021-01-01
IEEE Access
Abstract:Semantic segmentation is a high-level task in the field of computer vision, which paves the way for the realization of a complete understanding of the scene, and has been widely used in automatic driving, human-computer interaction, virtual reality, and other aspects. Recently, the semantic segmentation method of convolutional neural networks with deep structure has been more accurate and efficient than other methods. However, there are some problems in these methods, such as the loss of information caused by the down-sampling operation, the lack of usage of image context information, and the neglect of the relationship between spatial features and channel features. To solve these problems, a novel self-attention network based on the series-parallel structure is proposed in the paper. Firstly, a multi-scale dilated convolution backbone network is constructed by combining the dilated convolution and the residual network, which makes up for the information loss caused by the restriction of the receptor field in the ordinary network and improves the richness of extracted features. Secondly, the self-attention modules are stacked with serial and parallel structures, which can effectively extract the contextual information of space, channel, and space-channel and fully integrate them. Finally, the proposed algorithm is tested extensively and compared with the existing classical algorithms. The experimental results show that the proposed algorithm achieves state-of-the-art performance on the public dataset.
What problem does this paper attempt to address?