Abstract:Semantic segmentation has been a hot topic across diverse research fields. Along with the success of deep convolutional neural networks, semantic segmentation has made great achievements and improvements, in terms of both urban scene parsing and indoor semantic segmentation. However, most of the state-of-the-art models are still faced with a challenge in discriminative feature learning, which limits the ability of a model to detect multi-scale objects and to guarantee semantic consistency inside one object or distinguish different adjacent objects with similar appearance. In this paper, a practical and efficient edge-aware neural network is presented for semantic segmentation. This end-to-end trainable engine consists of a new encoder-decoder network, a large kernel spatial pyramid pooling (LKPP) block, and an edge-aware loss function. The encoder-decoder network was designed as a balanced structure to narrow the semantic and resolution gaps in multi-level feature aggregation, while the LKPP block was constructed with a densely expanding receptive field for multi-scale feature extraction and fusion. Furthermore, the new powerful edge-aware loss function is proposed to refine the boundaries directly from the semantic segmentation prediction for more robust and discriminative features. The effectiveness of the proposed model was demonstrated using Cityscapes, CamVid, and NYUDv2 benchmark datasets. The performance of the two structures and the edge-aware loss function in ELKPPNet was validated on the Cityscapes dataset, while the complete ELKPPNet was evaluated on the CamVid and NYUDv2 datasets. A comparative analysis with the state-of-the-art methods under the same conditions confirmed the superiority of the proposed algorithm.

PPNet : Pooling Position Attention Network for Semantic Segmentation

DPANET:Dual Pooling Attention Network for Semantic Segmentation

ACNET: Attention Based Network to Exploit Complementary Features for RGBD Semantic Segmentation.

Adaptive Local Cross-Channel Vector Pooling Attention Module for Semantic Segmentation of Remote Sensing Imagery

Point Attention Network for Point Cloud Semantic Segmentation.

Ppednet: Pyramid Pooling Encoder-Decoder Network For Real-Time Semantic Segmentation

Semantic boundary enhancement and position attention network with long-range dependency for semantic segmentation

Semantic Segmentation Network Based on Adaptive Attention and Deep Fusion Utilizing a Multi-Scale Dilated Convolutional Pyramid

FCPFNet: Feature Complementation Network with Pyramid Fusion for Semantic Segmentation

BMSeNet: Multiscale Context Pyramid Pooling and Spatial Detail Enhancement Network for Real-Time Semantic Segmentation

ELKPPNet: An Edge-aware Neural Network with Large Kernel Pyramid Pooling for Learning Discriminative Features in Semantic Segmentation

Optimizing rgb-d semantic segmentation through multi-modal interaction and pooling attention

MIPANet: optimizing RGB-D semantic segmentation through multi-modal interaction and pooling attention

Non-pooling Network for medical image segmentation

Attention Guided Global Enhancement and Local Refinement Network for Semantic Segmentation

SPANet: Successive Pooling Attention Network for Semantic Segmentation of Remote Sensing Images

IIE-SegNet: Deep Semantic Segmentation Network With Enhanced Boundary Based on Image Information Entropy

Semantic Image Segmentation with Improved Position Attention and Feature Fusion

DPNet: Dual-Pyramid Semantic Segmentation Network Based on Improved Deeplabv3 Plus

PCANet: Pyramid convolutional attention network for semantic segmentation

Semantic segmentation based on double pyramid network with improved global attention mechanism