FDR-TransUNet: A novel encoder-decoder architecture with vision transformer for improved medical image segmentation

Zhang Chaoyang,Sun Shibao,Hu Wenmao,Zhao Pengcheng
DOI: https://doi.org/10.1016/j.compbiomed.2023.107858
IF: 7.7
2024-02-01
Computers in Biology and Medicine
Abstract:The U-shaped and Transformer architectures have achieved exceptional performance in medical image segmentation and natural language processing, respectively. Their combination has also led to remarkable results but still suffers from enormous loss of image features during downsampling and the difficulty of recovering spatial information during upsampling. In this paper, we propose a novel encoder-decoder architecture for medical image segmentation, which has a flexibly adjustable hybrid encoder and two expanding paths decoder. The hybrid encoder incorporates the feature double reuse (FDR) block and the encoder of Vision Transformer (ViT), which can extract local and global pixel localization information, and alleviate image feature loss effectively. Meanwhile, we retain the original class-token sequence in the Vision Transformer and develop an additional corresponding expanding path. The class-token sequence and abstract image features are leveraged by two independent expanding paths with the deep-supervision strategy, which can better recover the image spatial information and accelerate model convergence. To further mitigate the feature loss and improve spatial information recovery, we introduce successive residual connections throughout the entire network. We evaluated our model on the COVID-19 lung segmentation and the infection area segmentation tasks. The mIoU index increased by 1.5 points and 3.9 points compared to other models which demonstrates a performance improvement.
engineering, biomedical,computer science, interdisciplinary applications,mathematical & computational biology,biology
What problem does this paper attempt to address?