D-UNet: a dimension-fusion U shape network for chronic stroke lesion segmentation

Yongjin Zhou,Weijian Huang,Pei Dong,Yong Xia,Shanshan Wang
DOI: https://doi.org/10.1109/TCBB.2019.2939522
2019-08-14
Abstract:Assessing the location and extent of lesions caused by chronic stroke is critical for medical diagnosis, surgical planning, and prognosis. In recent years, with the rapid development of 2D and 3D convolutional neural networks (CNN), the encoder-decoder structure has shown great potential in the field of medical image segmentation. However, the 2D CNN ignores the 3D information of medical images, while the 3D CNN suffers from high computational resource demands. This paper proposes a new architecture called dimension-fusion-UNet (D-UNet), which combines 2D and 3D convolution innovatively in the encoding stage. The proposed architecture achieves a better segmentation performance than 2D networks, while requiring significantly less computation time in comparison to 3D networks. Furthermore, to alleviate the data imbalance issue between positive and negative samples for the network training, we propose a new loss function called Enhance Mixing Loss (EML). This function adds a weighted focal coefficient and combines two traditional loss functions. The proposed method has been tested on the ATLAS dataset and compared to three state-of-the-art methods. The results demonstrate that the proposed method achieves the best quality performance in terms of DSC = 0.5349+0.2763 and precision = 0.6331+0.295).
Image and Video Processing,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to solve the problem of chronic stroke lesion segmentation. Specifically, the authors aim to develop a method that can accurately assess the location and extent of lesions caused by stroke, which is crucial for medical diagnosis, surgical planning, and prognosis. Traditional 2D convolutional neural networks (CNNs) ignore the 3D information of medical images, while 3D CNNs can extract spatial information but require high computational resources. For this reason, the paper proposes a new architecture - the Dimensional - Fusion U - shaped Network (D - UNet), which innovatively combines 2D and 3D convolutions in the encoding stage, improving the segmentation performance and significantly reducing the computational time compared to 3D networks. In addition, in order to alleviate the problem of data imbalance between positive and negative samples in network training, the authors also propose a new loss function - the Enhanced Mixed Loss (EML). This loss function enhances gradient propagation by adding a weighted focal coefficient and combining two traditional loss functions, thereby accelerating the network convergence speed and showing a smoother convergence curve. The main contributions of the paper are as follows: 1. Propose the D - UNet network, which is an improvement on the 2D UNet. It extracts the spatial information of MRI volume data by adding some 3D convolutions in the down - sampling module and fuses the extracted features with the 2D structure in a novel way. 2. Propose a new loss function, which not only enhances the gradient propagation in the traditional Dice loss but also combines the advantages of Dice loss and focal loss, enabling the network to converge faster and more smoothly. 3. Test the proposed method on the ATLAS dataset and compare it with three state - of - the - art methods, demonstrating the superior performance of this method in terms of DSC (0.5349 ± 0.2763) and precision (0.6331 ± 0.295).