ELKPPNet: An Edge-aware Neural Network with Large Kernel Pyramid Pooling for Learning Discriminative Features in Semantic Segmentation

Xianwei Zheng,Linxi Huan,Hanjiang Xiong,Jianya Gong
DOI: https://doi.org/10.48550/arXiv.1906.11428
2019-06-27
Abstract:Semantic segmentation has been a hot topic across diverse research fields. Along with the success of deep convolutional neural networks, semantic segmentation has made great achievements and improvements, in terms of both urban scene parsing and indoor semantic segmentation. However, most of the state-of-the-art models are still faced with a challenge in discriminative feature learning, which limits the ability of a model to detect multi-scale objects and to guarantee semantic consistency inside one object or distinguish different adjacent objects with similar appearance. In this paper, a practical and efficient edge-aware neural network is presented for semantic segmentation. This end-to-end trainable engine consists of a new encoder-decoder network, a large kernel spatial pyramid pooling (LKPP) block, and an edge-aware loss function. The encoder-decoder network was designed as a balanced structure to narrow the semantic and resolution gaps in multi-level feature aggregation, while the LKPP block was constructed with a densely expanding receptive field for multi-scale feature extraction and fusion. Furthermore, the new powerful edge-aware loss function is proposed to refine the boundaries directly from the semantic segmentation prediction for more robust and discriminative features. The effectiveness of the proposed model was demonstrated using Cityscapes, CamVid, and NYUDv2 benchmark datasets. The performance of the two structures and the edge-aware loss function in ELKPPNet was validated on the Cityscapes dataset, while the complete ELKPPNet was evaluated on the CamVid and NYUDv2 datasets. A comparative analysis with the state-of-the-art methods under the same conditions confirmed the superiority of the proposed algorithm.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper attempts to address two key challenges in semantic segmentation: 1. **Multi-scale Object Detection**: In semantic segmentation tasks, when there are objects of different scales in the image, an inappropriate receptive field size of the neural network can lead to an imbalance in attention to multi-scale objects. Networks with small receptive fields tend to focus more on small objects and segment large objects into multiple small parts; whereas networks with large receptive fields ignore details and fail to distinguish adjacent small objects. 2. **Detail Optimization**: Most deep learning methods are insensitive to detailed information, making it difficult to maintain semantic consistency within a single object (intra-class inconsistency) or to distinguish adjacent objects that are similar in appearance but different in semantics (inter-class indistinction). This results in blurred object boundaries, which may mislead the network. To address these issues, the authors propose a novel edge-aware neural network—ELKPPNet, which aims to improve the performance of semantic segmentation by learning more discriminative features. ELKPPNet consists of the following components: - **Balanced Encoder-Decoder Structure**: Used to narrow the semantic and resolution gaps between multi-level features while preserving geometric information. - **Large Kernel Pyramid Pooling (LKPP) Module**: Constructed by mixing Hybrid Asymmetric Dilated Convolution (HADC), it can extract rich multi-scale features at a lower computational cost and avoid the "gridding" problem caused by dilated convolutions. - **Edge-aware Loss Function**: Directly optimizes boundary details from semantic segmentation predictions, enhancing the model's ability to capture edge information, thereby improving intra-class consistency and inter-class distinction. Through these innovations, ELKPPNet has achieved excellent performance on multiple datasets, validating its effectiveness and superiority in semantic segmentation tasks.