An Enhanced Encoder-Decoder Network Architecture for Reducing Information Loss in Image Semantic Segmentation

Zijun Gao,Qi Wang,Taiyuan Mei,Xiaohan Cheng,Yun Zi,Haowei Yang
2024-05-26
Abstract:The traditional SegNet architecture commonly encounters significant information loss during the sampling process, which detrimentally affects its accuracy in image semantic segmentation tasks. To counter this challenge, we introduce an innovative encoder-decoder network structure enhanced with residual connections. Our approach employs a multi-residual connection strategy designed to preserve the intricate details across various image scales more effectively, thus minimizing the information loss inherent to down-sampling procedures. Additionally, to enhance the convergence rate of network training and mitigate sample imbalance issues, we have devised a modified cross-entropy loss function incorporating a balancing factor. This modification optimizes the distribution between positive and negative samples, thus improving the efficiency of model training. Experimental evaluations of our model demonstrate a substantial reduction in information loss and improved accuracy in semantic segmentation. Notably, our proposed network architecture demonstrates a substantial improvement in the finely annotated mean Intersection over Union (mIoU) on the dataset compared to the conventional SegNet. The proposed network structure not only reduces operational costs by decreasing manual inspection needs but also scales up the deployment of AI-driven image analysis across different sectors.
Image and Video Processing,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address the significant information loss issue present in traditional SegNet architectures for image semantic segmentation. Specifically, the paper points out that during the downsampling process in the encoding stage, traditional SegNet architectures lose a substantial amount of information, which affects the accuracy of the segmentation task. To tackle this challenge, the authors propose an innovative encoder-decoder network structure that introduces residual connections to reduce information loss and improve segmentation accuracy. The main improvements include: 1. **Multi-Residual Connection Strategy**: This strategy helps retain more detailed information at different image scales, thereby enhancing the network's ability to perform accurate segmentation. 2. **Improved Cross-Entropy Loss Function**: To enhance the convergence speed during training and address the issue of sample imbalance, the authors designed a modified version of the cross-entropy loss function that includes a balancing factor. This method optimizes the distribution between positive and negative samples, improving the efficiency of model training. Experimental results show that the proposed network architecture not only significantly reduces information loss but also greatly improves the accuracy of semantic segmentation. Particularly, it performs better than traditional SegNet in terms of the mean Intersection over Union (mIoU) metric. Additionally, this method promotes the deployment and application of AI-driven image analysis across various fields by reducing the need for manual inspection.