Polyp segmentation in colonoscopy images using DeepLabV3++

Al Mohimanul Islam,Sadia Shakiba Bhuiyan,Mysun Mashira,Md. Rayhan Ahmed,Salekul Islam,Swakkhar Shatabda
2024-07-28
Abstract:Segmenting polyps in colonoscopy images is essential for the early identification and diagnosis of colorectal cancer, a significant cause of worldwide cancer deaths. Prior deep learning based models such as Attention based variation, UNet variations and Transformer-derived networks have had notable success in capturing intricate features and complex polyp shapes. In this study, we have introduced the DeepLabv3++ model which is an enhanced version of the DeepLabv3+ architecture. It is designed to improve the precision and robustness of polyp segmentation in colonoscopy images. We have utilized The proposed model incorporates diverse separable convolutional layers and attention mechanisms within the MSPP block, enhancing its capacity to capture multi-scale and directional features. Additionally, the redesigned decoder further transforms the extracted features from the encoder into a more meaningful segmentation map. Our model was evaluated on three public datasets (CVC-ColonDB, CVC-ClinicDB, Kvasir-SEG) achieving Dice coefficient scores of 96.20%, 96.54%, and 96.08%, respectively. The experimental analysis shows that DeepLabV3++ outperforms several state-of-the-art models in polyp segmentation tasks. Furthermore, compared to the baseline DeepLabV3+ model, our DeepLabV3++ with its MSPP module and redesigned decoder architecture, significantly reduced segmentation errors (e.g., false positives/negatives) across small, medium, and large polyps. This improvement in polyp delineation is crucial for accurate clinical decision-making in colonoscopy.
Image and Video Processing,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the accurate segmentation of polyps in colonoscopic images. Early identification of polyps is crucial for the prevention and diagnosis of colorectal cancer. Although existing deep - learning models have achieved remarkable success in capturing complex features and polyp shapes, they still face challenges in accurately capturing details and enhancing local and global feature representations. Therefore, this paper proposes an improved DeepLabV3++ model, aiming to improve the accuracy and robustness of polyp segmentation. ### Main contributions of the paper 1. **Proposed a robust segmentation framework based on the EfficientNetV2S encoder**: This framework enhances the performance of the DeepLabV3++ model through an efficient decoding module. 2. **Developed the Multi - Scale Pyramid Pooling (MSPP) module**: This module effectively extracts features from feature maps of different scales while maintaining computational efficiency by using diverse separable convolutions and different kernel sizes. The MSPP module also includes skip connections to preserve spatial information and improve the gradient flow. 3. **Integrated the Parallel Attention Aggregation Block (PAAB)**: The PAAB module enhances the model's ability to represent spatial features by efficiently aggregating spatial and channel information. The spatial attention mechanism uses multi - kernel separable convolutions to improve the model's ability to represent spatial features; the channel attention mechanism enhances the interdependence between the extracted features. 4. **Conducted a comprehensive comparative analysis on three publicly available benchmark datasets**: The experimental results show that the proposed DeepLabV3++ model outperforms existing advanced polyp - segmentation models in multiple metrics, especially when dealing with small, medium, and large polyps, significantly reducing segmentation errors. ### Overview of the model architecture #### 1. Encoder - **EfficientNetV2S**: As the backbone network of the encoder, it is used for initial feature extraction. EfficientNetV2S has a smaller model size and higher training efficiency, which is suitable for various computer vision tasks. - **Multi - Scale Pyramid Pooling (MSPP)**: Replacing the traditional Atrous Spatial Pyramid Pooling (ASPP), it more efficiently captures multi - scale context information by combining different convolution and pooling layers. The MSPP module includes multiple 3×3 separable convolution layers with different dilation rates (4, 8, 12), as well as 5×1 and 1×5 separable convolution layers to capture directional information. #### 2. Parallel Attention Aggregation Block (PAAB) - **Spatial attention mechanism**: Generate average - pooled features and max - pooled features through average pooling and max pooling operations, and then convolve them through three separable convolution layers with different kernel sizes (3, 5, 7) to generate a 2D spatial attention map. - **Channel attention mechanism**: Generate a channel attention map through global average pooling (GAP) and global maximum pooling (GMP), and then generate the final channel attention map through two dense layers. #### 3. Decoder - **Upsampling**: The decoder first upsamples the features generated by the encoder to restore to a higher resolution. - **Feature fusion**: The upsampled features are spliced with the low - level features in the encoder through skip connections, and then the segmentation map is further refined through a series of convolution layers and skip connections, and finally additional upsampling is performed to align the input resolution to generate accurate polyp segmentation results. ### Experimental results The paper was evaluated on three publicly available benchmark datasets (CVC - ColonDB, CVC - ClinicDB, Kvasir - SEG), and the Dice coefficients reached 96.20%, 96.54%, and 96.08% respectively. The experimental analysis shows that the DeepLabV3++ model significantly outperforms other state - of - the - art models in the polyp - segmentation task, especially when dealing with small, medium, and large polyps, significantly reducing false positives and false negatives. ### Conclusion The DeepLabV3++ model proposed in this paper significantly improves the accuracy and robustness of polyp segmentation in colonoscopic images by introducing the EfficientNetV2S encoder, MSPP module, and PAAB module, providing more accurate support for clinical decision - making.