W-MAFormer: W-shaped Multi-Attention Assisted Transformer for Polyp Segmentation

M. Yi,Y. Su,Y. Shen,W. Wang
DOI: https://doi.org/10.1117/12.3008772
2024-01-01
Abstract:Colorectal cancer, ranked as the third deadliest disease globally, can be effectively prevented through the timely detection and removal of colorectal polyps. Precise diagnosis necessitates the accurate segmentation of these polyps, a task where existing deep learning solutions exhibit limitations. Specifically, some hierarchical models employs separated decoder branches a each layer, while others gradually combine the information layer by layer. This introduces instability, as each feature map is going through difference process, and can result in discontinuous segmentation map. Bridging this gap, we introduce the W-shaped Multi-Attention Assisted Transformer (W-MAFormer) for polyp segmentation, out of the design for enhancing shared features across different levels instead of processing them separately. The decoder employs transformer modules in lieu of conventional convolutional blocks. Structurally, our encoder harnesses the pyramid vision transformer's capabilities, while our decoder amalgamates three pivotal modules: Reference Feature Extractor (RFE), Semantic Feature Enhancement (SFE), and Reverse Attention Decoder (RAD). Notably, the SFE module employs mutual and dual attention mechanisms to augment shared information across varying scales and channels of feature maps. This enhancement necessitates a robust reference map, a responsibility vested in the RFE. Subsequent to this refinement, the improved feature map is channeled to the RAD, which employs reverse attention operations to yield the final prediction. Throughout this architecture, attention mechanisms remain paramount, safeguarding the preservation of shared features. Our comprehensive evaluation spanning five prominent datasets showcases the model's prowess, with both quantitative and numerical results that commendably outpace several contemporary state-of-the-art semantic segmentation methods.
What problem does this paper attempt to address?