Abstract:Semantic segmentation has been a hot topic across diverse research fields. Along with the success of deep convolutional neural networks, semantic segmentation has made great achievements and improvements, in terms of both urban scene parsing and indoor semantic segmentation. However, most of the state-of-the-art models are still faced with a challenge in discriminative feature learning, which limits the ability of a model to detect multi-scale objects and to guarantee semantic consistency inside one object or distinguish different adjacent objects with similar appearance. In this paper, a practical and efficient edge-aware neural network is presented for semantic segmentation. This end-to-end trainable engine consists of a new encoder-decoder network, a large kernel spatial pyramid pooling (LKPP) block, and an edge-aware loss function. The encoder-decoder network was designed as a balanced structure to narrow the semantic and resolution gaps in multi-level feature aggregation, while the LKPP block was constructed with a densely expanding receptive field for multi-scale feature extraction and fusion. Furthermore, the new powerful edge-aware loss function is proposed to refine the boundaries directly from the semantic segmentation prediction for more robust and discriminative features. The effectiveness of the proposed model was demonstrated using Cityscapes, CamVid, and NYUDv2 benchmark datasets. The performance of the two structures and the edge-aware loss function in ELKPPNet was validated on the Cityscapes dataset, while the complete ELKPPNet was evaluated on the CamVid and NYUDv2 datasets. A comparative analysis with the state-of-the-art methods under the same conditions confirmed the superiority of the proposed algorithm.

What problem does this paper attempt to address?

The paper attempts to address two key challenges in semantic segmentation: 1. **Multi-scale Object Detection**: In semantic segmentation tasks, when there are objects of different scales in the image, an inappropriate receptive field size of the neural network can lead to an imbalance in attention to multi-scale objects. Networks with small receptive fields tend to focus more on small objects and segment large objects into multiple small parts; whereas networks with large receptive fields ignore details and fail to distinguish adjacent small objects. 2. **Detail Optimization**: Most deep learning methods are insensitive to detailed information, making it difficult to maintain semantic consistency within a single object (intra-class inconsistency) or to distinguish adjacent objects that are similar in appearance but different in semantics (inter-class indistinction). This results in blurred object boundaries, which may mislead the network. To address these issues, the authors propose a novel edge-aware neural network—ELKPPNet, which aims to improve the performance of semantic segmentation by learning more discriminative features. ELKPPNet consists of the following components: - **Balanced Encoder-Decoder Structure**: Used to narrow the semantic and resolution gaps between multi-level features while preserving geometric information. - **Large Kernel Pyramid Pooling (LKPP) Module**: Constructed by mixing Hybrid Asymmetric Dilated Convolution (HADC), it can extract rich multi-scale features at a lower computational cost and avoid the "gridding" problem caused by dilated convolutions. - **Edge-aware Loss Function**: Directly optimizes boundary details from semantic segmentation predictions, enhancing the model's ability to capture edge information, thereby improving intra-class consistency and inter-class distinction. Through these innovations, ELKPPNet has achieved excellent performance on multiple datasets, validating its effectiveness and superiority in semantic segmentation tasks.

ELKPPNet: An Edge-aware Neural Network with Large Kernel Pyramid Pooling for Learning Discriminative Features in Semantic Segmentation

Ppednet: Pyramid Pooling Encoder-Decoder Network For Real-Time Semantic Segmentation

DPNet: Dual-Pyramid Semantic Segmentation Network Based on Improved Deeplabv3 Plus

Attention Guided Global Enhancement and Local Refinement Network for Semantic Segmentation

FCPFNet: Feature Complementation Network with Pyramid Fusion for Semantic Segmentation

ELANet: an efficiently lightweight asymmetrical network for real-time semantic segmentation

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Efficiently Expanding Receptive Fields: Local Split Attention and Parallel Aggregation for Enhanced Large-scale Point Cloud Semantic Segmentation

PPNet : Pooling Position Attention Network for Semantic Segmentation

ELANet: Effective Lightweight Attention-Guided Network for Real-Time Semantic Segmentation

BMSeNet: Multiscale Context Pyramid Pooling and Spatial Detail Enhancement Network for Real-Time Semantic Segmentation

Edge-Enhanced GCIFFNet: A Multiclass Semantic Segmentation Network Based on Edge Enhancement and Multiscale Attention Mechanism

Efficient Multi-scale Network for Semantic Segmentation of fine-Resolution Remotely Sensed Images

Semantic segmentation based on enhanced gated pyramid network with lightweight attention module

PPANet: Point-Wise Pyramid Attention Network for Semantic Segmentation.

Enhanced Feature Pyramid Network for Semantic Segmentation.

Semantic Relocation Parallel Network for Semantic Segmentation

GPNet: Gated pyramid network for semantic segmentation

Cross Guided and Pyramid Aggregation Networks for Real-time Semantic Segmentation

DPANET:Dual Pooling Attention Network for Semantic Segmentation

LMANet: A Lightweight Asymmetric Semantic Segmentation Network Based on Multi-Scale Feature Extraction