Abstract:Automatic and precise polyp segmentation in colonoscopy images is highly valuable for diagnosis at an early stage and surgery of colorectal cancer. Nevertheless, it still posed a major challenge due to variations in the size and intricate morphological characteristics of polyps coupled with the indistinct demarcation between polyps and mucosas. To alleviate these challenges, we proposed an improved dual-aggregation polyp segmentation network, dubbed Dua-PSNet, for automatic and accurate full-size polyp prediction by combining both the transformer branch and a fully convolutional network (FCN) branch in a parallel style. Concretely, in the transformer branch, we adopted the B3 variant of pyramid vision transformer v2 (PVTv2-B3) as an image encoder for capturing multi-scale global features and modeling long-distant interdependencies between them whilst designing an innovative multi-stage feature aggregation decoder (MFAD) to highlight critical local feature details and effectively integrate them into global features. In the decoder, the adaptive feature aggregation (AFA) block was constructed for fusing high-level feature representations of different scales generated by the PVTv2-B3 encoder in a stepwise adaptive manner for refining global semantic information, while the ResidualBlock module was devised to mine detailed boundary cues disguised in low-level features. With the assistance of the selective global-to-local fusion head (SGLFH) module, the resulting boundary details were aggregated selectively with these global semantic features, strengthening these hierarchical features to cope with scale variations of polyps. The FCN branch embedded in the designed ResidualBlock module was used to encourage extraction of highly merged fine features to match the outputs of the Transformer branch into full-size segmentation maps. In this way, both branches were reciprocally influenced and complemented to enhance the discrimination capability of polyp features and enable a more accurate prediction of a full-size segmentation map. Extensive experiments on five challenging polyp segmentation benchmarks demonstrated that the proposed Dua-PSNet owned powerful learning and generalization ability and advanced the state-of-the-art segmentation performance among existing cutting-edge methods. These excellent results showed our Dua-PSNet had great potential to be a promising solution for practical polyp segmentation tasks in which wide variations of data typically occurred.

PVT2DNet: Polyp segmentation with vision transformer and dual decoder refinement strategy

Probabilistic Modeling Ensemble Vision Transformer Improves Complex Polyp Segmentation

Improved dual-aggregation polyp segmentation network combining a pyramid vision transformer with a fully convolutional network

Dual-branch multi-information aggregation network with transformer and convolution for polyp segmentation

Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers

Multi-Layer Dense Attention Decoder for Polyp Segmentation

PolySegNet: improving polyp segmentation through swin transformer and vision transformer fusion

Multi‐scale nested UNet with transformer for colorectal polyp segmentation

Segmentation of polyps based on pyramid vision transformers and residual block for real-time endoscopy imaging

Colorectal Polyp Segmentation Based on Group Convolution and Transformer

PPNet: Pyramid pooling based network for polyp segmentation

Dual encoder–decoder-based deep polyp segmentation network for colonoscopy images

DPE-Net: Dual-Parallel Encoder Based Network for Semantic Segmentation of Polyps

Dual‐branch feature extraction network combined with Transformer and CNN for polyp segmentation

PRCNet: A parallel reverse convolutional attention network for colorectal polyp segmentation

Three-stage polyp segmentation network based on reverse attention feature purification with Pyramid Vision Transformer

PDLFBR-Net: Partial Decoder Localization and Foreground-Background Refinement Network for Polyp Segmentation

FMCA-Net: A feature secondary multiplexing and dilated convolutional attention polyp segmentation network based on pyramid vision transformer

Colorectal Polyp Segmentation by U-Net with Dilation Convolution

Adaptation of Distinct Semantics for Uncertain Areas in Polyp Segmentation