Dual self-distillation of U-shaped networks for 3D medical image segmentation

Soumyanil Banerjee,Ming Dong,Carri Glide-Hurst
2023-06-06
Abstract:U-shaped networks and its variants have demonstrated exceptional results for medical image segmentation. In this paper, we propose a novel dual self-distillation (DSD) framework for U-shaped networks for 3D medical image segmentation. DSD distills knowledge from the ground-truth segmentation labels to the decoder layers and also between the encoder and decoder layers of a single U-shaped network. DSD is a generalized training strategy that could be attached to the backbone architecture of any U-shaped network to further improve its segmentation performance. We attached DSD on two state-of-the-art U-shaped backbones, and extensive experiments on two public 3D medical image segmentation datasets (cardiac substructure and brain tumor) demonstrated significant improvement over those backbones. On average, after attaching DSD to the U-shaped backbones, we observed an improvement of 4.25% and 3.15% in Dice similarity score for cardiac substructure and brain tumor segmentation respectively.
Image and Video Processing,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to address the problem of improving the performance of U-shaped networks in 3D medical image segmentation tasks. Specifically, the authors propose a novel Dual Self-Distillation (DSD) framework, aiming to further enhance the performance of U-shaped networks in 3D medical image segmentation through knowledge distillation from ground-truth labels to decoder layers and between encoder and decoder layers. ### Main Contributions: 1. **First application of self-distillation to U-shaped networks**: This is the first time that self-distillation technology has been applied to U-shaped networks for medical image segmentation. By designing the DSD framework, this technology can be implemented in any U-shaped segmentation backbone network. 2. **More general improvement method**: The DSD framework is a more general method that can significantly enhance the segmentation performance of any U-shaped backbone network. The widely adopted deep supervision method can be regarded as a special case of DSD. 3. **Experimental validation**: The authors conducted extensive experiments on two public 3D medical image segmentation datasets (cardiac substructure and brain tumor), applying the DSD framework to two state-of-the-art U-shaped backbone networks (one based on ViT and the other based on CNN). The results show significant quantitative and qualitative improvements. ### Method Overview: - **DSD Framework**: The DSD framework includes two main parts: - **Distillation from ground-truth labels**: Knowledge distillation from ground-truth labels to each decoder layer, which is referred to as deep supervision. - **Distillation between encoder and decoder layers**: Knowledge distillation from the deepest encoder layer to shallower encoder layers, and from the deepest decoder layer to shallower decoder layers. ### Experimental Results: - **Cardiac substructure segmentation**: On the MMWHS dataset, the DSD framework improved the Dice similarity score of UNetR and nnU-Net by 4.25% and 4.3%, respectively, and reduced the Hausdorff distance by 8.4 mm and 10.6 mm, respectively. - **Brain tumor segmentation**: On the MSD dataset, the DSD framework improved the Dice similarity score of UNetR and nnU-Net by 3.6% and 2.7%, respectively, and reduced the Hausdorff distance by 9.2 mm and 3.8 mm, respectively. ### Conclusion: The DSD framework is an effective training strategy that can significantly enhance the performance of U-shaped networks in 3D medical image segmentation tasks. This method not only performs well in quantitative metrics but also shows noticeable improvements in qualitative results.