MISSFormer: an Effective Transformer for 2D Medical Image Segmentation

Xiaohong Huang,Zhifang Deng,Dandan Li,Xueguang Yuan,Ying Fu
DOI: https://doi.org/10.1109/tmi.2022.3230943
IF: 10.6
2023-01-01
IEEE Transactions on Medical Imaging
Abstract:Transformer-based methods are recently popular in vision tasks because of their capability to model global dependencies alone. However, it limits the performance of networks due to the lack of modeling local context and global-local correlations of multi-scale features. In this paper, we present MISSFormer, a Medical Image Segmentation tranSFormer. MISSFormer is a hierarchical encoder-decoder network with two appealing designs: 1) a feed-forward network in transformer block of U-shaped encoder-decoder structure is redesigned, ReMix-FFN, which explore global dependencies and local context for better feature discrimination by re-integrating the local context and global dependencies; 2) a ReMixed Transformer Context Bridge is proposed to extract the correlations of global dependencies and local context in multi-scale features generated by our hierarchical transformer encoder. The MISSFormer shows a solid capacity to capture more discriminative dependencies and context in medical image segmentation. The experiments on multi-organ, cardiac segmentation and retinal vessel segmentation tasks demonstrate the superiority, effectiveness and robustness of our MISSFormer. Specifically, the experimental results of MISSFormer trained from scratch even outperform state-of-the-art methods pre-trained on ImageNet, and the core designs can be generalized to other visual segmentation tasks. The code has been released on Github: https://github.com/ZhifangDeng/MISSFormer.
What problem does this paper attempt to address?