X-Net: a dual encoding–decoding method in medical image segmentation

Yuanyuan Li,Ziyu Wang,Li Yin,Zhiqin Zhu,Guanqiu Qi,Yu Liu
DOI: https://doi.org/10.1007/s00371-021-02328-7
IF: 2.835
2021-11-05
The Visual Computer
Abstract:Medical image segmentation has the priori guiding significance for clinical diagnosis and treatment. In the past ten years, a large number of experimental facts have proved the great success of deep convolutional neural networks in various medical image segmentation tasks. However, the convolutional networks seem to focus too much on the local image details, while ignoring the long-range dependence. The Transformer structure can encode long-range dependencies in image and learn high-dimensional image information through the self-attention mechanism. But this structure currently depends on the database scale to give full play to its excellent performance, which limits its application in medical images with limited database size. In this paper, the characteristics of CNNs and Transformer are integrated to propose a dual encoding–decoding structure of the X-shaped network (X-Net). It can serve as a good alternative to the traditional pure convolutional medical image segmentation network. In the encoding phase, the local and global features are simultaneously extracted by two types of encoders, convolutional downsampling, and Transformer and then merged through jump connection. In the decoding phase, a variational auto-encoder branch is added to reconstruct the input image itself in order to weaken the impact of insufficient data. Comparative experiments on three medical image datasets show that X-Net can realize the organic combination of Transformer and CNNs.
computer science, software engineering
What problem does this paper attempt to address?