Multi-scale Neighborhood Attention Transformer on U-Net for Medical Image Segmentation.

Nanxing Zhang,Shiqiang Ma,Xuejian Li,Jiahui Zhang,Jijun Tang,Fei Guo
DOI: https://doi.org/10.1109/BIBM55620.2022.9994872
2022-01-01
Abstract:U-shaped network structures with skip connections played an irreplaceable role in medical image analysis, but the limitation of convolution makes it unable to learn long-distance semantic information well. The recent success of Transformer in natural language processing and image classification shows that it can benefit from global information modeling by using self-attention mechanisms. However, both local and global features are equally important for dense prediction tasks. Transformer ignores local semantic information to a certain extent. In this study, we propose a Unet-like Transformer for medical image segmentation, named MN-Unet, which can simultaneously extract local and global features. MN-Unet consists of encoder, decoder, and skip connections. Specially, we design an encoder based on the Neighborhood Attention Transformer, which fuse three neighborhood sizes of different dimensions to simultaneously extract local and global features. In the decoder, we use bilinear interpolation to restore the image to its original size. Skip connection is added to alleviate the distortion of low resolution to high resolution. MN-Unet can achieve accurate segmentation of medical images without any pre-training. Extensive experimental results on two medical image datasets (LiTS 2017 and BraTS 2020) show that we achieve relatively better performance than state-of-the-art methods. The codes and trained models will be publicly available a https://github.com/hutchinsonian/MN_Unet
What problem does this paper attempt to address?