OVERVIEW OF TRANSFORMER-BASED MODELS FOR MEDICAL IMAGE SEGMENTATION

Diana Nam,Alexandr Pak
DOI: https://doi.org/10.37943/13bkbf2003
2023-03-30
Scientific Journal of Astana IT University
Abstract:Premedical diagnostics is the process of examining survey results. Correct premedical diagnostics can improve the process of patient management and reduce the burden on the medical sector. Diagnostics of medical images such as computed tomography and X-ray are obligatory steps for further treatment. However, the shortage of clinicians causes delays in this step. We observed two state-of-the-art algorithms proposed for medical image segmentation: TransUnet and Swin-Unet. We conducted a theoretical comparison of algorithms in terms of the applicability of pre-hospital diagnostics according to quality and speed of training. The comparison is based on the original source of code provided by the authors of the original articles. We chose these two algorithms because they have similar U-form architecture, a high level of citation, and show competitive DICE scores on pictures of various human organs. Some architectural features were also important. Both models inherit key elements of U-net. TransUnet is a hybrid Transformer and CNN model. It consists of Transformer encoder and a convolutional decoder. Some additional computations are required in the bottleneck. Swin-Unet is a fully Transformer-based model. These architectural differences give rise to a difference in the number of trainable parameters. Generally, deeper architectures with a bigger number of parameters usually show better performance, however, according to our review, Swin-Unet has a smaller number of parameters and shows better DICE and Hausdorff Distance. It should be noted that the distribution between false positive and false negative predictions is important in medical image processing. It is crucial to avoid overloading the medical sector while also not missing any sick patients. Precision and recall can be used to evaluate the ratio of incorrect predictions. Therefore, we also observed the results of caries segmentation where precision and DICE were provided. In this specific case, TransUnet shows better DICE and recall values but worse precision.
What problem does this paper attempt to address?