Abstract:Segmenting polyps in colonoscopy images is essential for the early identification and diagnosis of colorectal cancer, a significant cause of worldwide cancer deaths. Prior deep learning based models such as Attention based variation, UNet variations and Transformer-derived networks have had notable success in capturing intricate features and complex polyp shapes. In this study, we have introduced the DeepLabv3++ model which is an enhanced version of the DeepLabv3+ architecture. It is designed to improve the precision and robustness of polyp segmentation in colonoscopy images. We have utilized The proposed model incorporates diverse separable convolutional layers and attention mechanisms within the MSPP block, enhancing its capacity to capture multi-scale and directional features. Additionally, the redesigned decoder further transforms the extracted features from the encoder into a more meaningful segmentation map. Our model was evaluated on three public datasets (CVC-ColonDB, CVC-ClinicDB, Kvasir-SEG) achieving Dice coefficient scores of 96.20%, 96.54%, and 96.08%, respectively. The experimental analysis shows that DeepLabV3++ outperforms several state-of-the-art models in polyp segmentation tasks. Furthermore, compared to the baseline DeepLabV3+ model, our DeepLabV3++ with its MSPP module and redesigned decoder architecture, significantly reduced segmentation errors (e.g., false positives/negatives) across small, medium, and large polyps. This improvement in polyp delineation is crucial for accurate clinical decision-making in colonoscopy.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the accurate segmentation of polyps in colonoscopic images. Early identification of polyps is crucial for the prevention and diagnosis of colorectal cancer. Although existing deep - learning models have achieved remarkable success in capturing complex features and polyp shapes, they still face challenges in accurately capturing details and enhancing local and global feature representations. Therefore, this paper proposes an improved DeepLabV3++ model, aiming to improve the accuracy and robustness of polyp segmentation. ### Main contributions of the paper 1. **Proposed a robust segmentation framework based on the EfficientNetV2S encoder**: This framework enhances the performance of the DeepLabV3++ model through an efficient decoding module. 2. **Developed the Multi - Scale Pyramid Pooling (MSPP) module**: This module effectively extracts features from feature maps of different scales while maintaining computational efficiency by using diverse separable convolutions and different kernel sizes. The MSPP module also includes skip connections to preserve spatial information and improve the gradient flow. 3. **Integrated the Parallel Attention Aggregation Block (PAAB)**: The PAAB module enhances the model's ability to represent spatial features by efficiently aggregating spatial and channel information. The spatial attention mechanism uses multi - kernel separable convolutions to improve the model's ability to represent spatial features; the channel attention mechanism enhances the interdependence between the extracted features. 4. **Conducted a comprehensive comparative analysis on three publicly available benchmark datasets**: The experimental results show that the proposed DeepLabV3++ model outperforms existing advanced polyp - segmentation models in multiple metrics, especially when dealing with small, medium, and large polyps, significantly reducing segmentation errors. ### Overview of the model architecture #### 1. Encoder - **EfficientNetV2S**: As the backbone network of the encoder, it is used for initial feature extraction. EfficientNetV2S has a smaller model size and higher training efficiency, which is suitable for various computer vision tasks. - **Multi - Scale Pyramid Pooling (MSPP)**: Replacing the traditional Atrous Spatial Pyramid Pooling (ASPP), it more efficiently captures multi - scale context information by combining different convolution and pooling layers. The MSPP module includes multiple 3×3 separable convolution layers with different dilation rates (4, 8, 12), as well as 5×1 and 1×5 separable convolution layers to capture directional information. #### 2. Parallel Attention Aggregation Block (PAAB) - **Spatial attention mechanism**: Generate average - pooled features and max - pooled features through average pooling and max pooling operations, and then convolve them through three separable convolution layers with different kernel sizes (3, 5, 7) to generate a 2D spatial attention map. - **Channel attention mechanism**: Generate a channel attention map through global average pooling (GAP) and global maximum pooling (GMP), and then generate the final channel attention map through two dense layers. #### 3. Decoder - **Upsampling**: The decoder first upsamples the features generated by the encoder to restore to a higher resolution. - **Feature fusion**: The upsampled features are spliced with the low - level features in the encoder through skip connections, and then the segmentation map is further refined through a series of convolution layers and skip connections, and finally additional upsampling is performed to align the input resolution to generate accurate polyp segmentation results. ### Experimental results The paper was evaluated on three publicly available benchmark datasets (CVC - ColonDB, CVC - ClinicDB, Kvasir - SEG), and the Dice coefficients reached 96.20%, 96.54%, and 96.08% respectively. The experimental analysis shows that the DeepLabV3++ model significantly outperforms other state - of - the - art models in the polyp - segmentation task, especially when dealing with small, medium, and large polyps, significantly reducing false positives and false negatives. ### Conclusion The DeepLabV3++ model proposed in this paper significantly improves the accuracy and robustness of polyp segmentation in colonoscopic images by introducing the EfficientNetV2S encoder, MSPP module, and PAAB module, providing more accurate support for clinical decision - making.

Polyp segmentation in colonoscopy images using DeepLabV3++

Modified DeeplabV3+ with multi-level context attention mechanism for colonoscopy polyp segmentation

IDDF2018-ABS-0260 Deep Learning for Polyp Segmentation

IRv2-Net: A Deep Learning Framework for Enhanced Polyp Segmentation Performance Integrating InceptionResNetV2 and UNet Architecture with Test Time Augmentation Techniques

Automatic Polyp Segmentation in Colonoscopy Images Using a Modified Deep Convolutional Encoder-Decoder Architecture

IDDF2018-ABS-0259 Segmentation of Intestinal Polyps Via a Deep Learning Algorithm

M3FPolypSegNet: Segmentation Network with Multi-frequency Feature Fusion for Polyp Localization in Colonoscopy Images

Polyp Segmentation in Colonoscopy Images Using Fully Convolutional Network

PolypDB: A Curated Multi-Center Dataset for Development of AI Algorithms in Colonoscopy

Multi-scale and Multi-path Cascaded Convolutional Network for Semantic Segmentation of Colorectal Polyps

BetterNet: An Efficient CNN Architecture with Residual Learning and Attention for Precision Polyp Segmentation

PPFormer: A Novel Model for Polyp Segmentation in Digestive Endoscopy

Multi-Layer Dense Attention Decoder for Polyp Segmentation

A Deep Convolutional Neural Network for the Detection of Polyps in Colonoscopy Images

IDDF2018-ABS-0257 Detecting and Segmenting Polyps Using a Deep Learning-Based Model

CMNet: deep learning model for colon polyp segmentation based on dual-branch structure

The Application Research of Deep Neural Networks in Colonic Polyp Segmentation

Dataset-level Color Augmentation and Multi-scale Exploration Methods for Polyp Segmentation

Colorectal Polyp Segmentation by U-Net with Dilation Convolution

NA-segformer: A multi-level transformer model based on neighborhood attention for colonoscopic polyp segmentation

DPE-Net: Dual-Parallel Encoder Based Network for Semantic Segmentation of Polyps