FTUNet: A Feature-Enhanced Network for Medical Image Segmentation Based on the Combination of U-Shaped Network and Vision Transformer

Yuefei Wang,Xi Yu,Yixi Yang,Shijie Zeng,Yuquan Xu,Ronghui Feng
DOI: https://doi.org/10.1007/s11063-024-11533-z
IF: 2.565
2024-03-05
Neural Processing Letters
Abstract:Semantic Segmentation has been widely used in a variety of clinical images, which greatly assists medical diagnosis and other work. To address the challenge of reduced semantic inference accuracy caused by feature weakening, a pioneering network called FTUNet (Feature-enhanced Transformer UNet) was introduced, leveraging the classical Encoder-Decoder architecture. Firstly, a dual-branch Encoder is proposed based on the U-shaped structure. In addition to employing convolution for feature extraction, a Layer Transformer structure (LTrans) is established to capture long-range dependencies and global context information. Then, an Inception structural module focusing on local features is proposed at the Bottleneck, which adopts the dilated convolution to amplify the receptive field to achieve deeper semantic mining based on the comprehensive information brought by the dual Encoder. Finally, in order to amplify feature differences, a lightweight attention mechanism of feature polarization is proposed at Skip Connection, which can strengthen or suppress feature channels by reallocating weights. The experiment is conducted on 3 different medical datasets. A comprehensive and detailed comparison was conducted with 6 non-U-shaped models, 5 U-shaped models, and 3 Transformer models in 8 categories of indicators. Meanwhile, 9 kinds of layer-by-layer ablation and 4 kinds of other embedding attempts are implemented to demonstrate the optimal structure of the current FTUNet.
computer science, artificial intelligence
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper aims to address the issue of feature attenuation leading to a decline in semantic reasoning accuracy in medical image segmentation. Specifically, the paper proposes a new network architecture called FTUNet (Feature-enhanced Transformer UNet), which combines the classic encoder-decoder structure. #### Main Contributions: 1. **Dual-branch Encoder**: Based on the U-shaped structure, a dual-branch encoder is proposed. In addition to using convolution for feature extraction, a Layer Transformer structure (LTrans) is introduced to capture long-range dependencies and global contextual information. 2. **Bottleneck Module**: A new Inception structure module is proposed in the bottleneck part, utilizing dilated convolution to enlarge the receptive field for deeper semantic mining. 3. **Lightweight Attention Mechanism**: A lightweight attention mechanism called Feature Polarization is proposed at the skip connections. This mechanism enhances or suppresses feature channels by reallocating weights, thereby amplifying feature differences. #### Experimental Validation: Experiments were conducted on three different medical datasets and compared comprehensively and in detail with 6 non-U-shaped models, 5 U-shaped models, and 3 Transformer models. Additionally, the superiority of the current FTUNet structure was demonstrated through 9 layer-by-layer ablation experiments and 4 other embedding attempts.