Automatic segmentation of echocardiographic images using a Shifted Windows Vision Transformer architecture

Souha Nemri,Luc Duong
DOI: https://doi.org/10.1088/2057-1976/ad7594
2024-08-30
Abstract:Echocardiography is one the most commonly used imaging modalities for the diagnosis
of congenital heart disease. Echocardiographic image analysis is crucial to obtaining
accurate cardiac anatomy information. Semantic segmentation models can be used
to precisely delimit the borders of the left ventricle, and allow an accurate and
automatic identification of the region of interest, which can be extremely useful for
cardiologists. In the field of computer vision, convolutional neural network (CNN)
architectures remain dominant. Existing CNN approaches have proved highly efficient
for the segmentation of various medical images over the past decade. However, these
solutions usually struggle to capture long-range dependencies, especially when it comes
to images with objects of different scales and complex structures. In this study, we
present an efficient method for semantic segmentation of echocardiographic images
that overcomes these challenges by leveraging the self-attention mechanism of the
Transformer architecture. The proposed solution extracts long-range dependencies and
efficiently processes objects at different scales, improving performance in a variety of
tasks. We introduce Shifted Windows Transformer models (Swin Transformers), which
encode both the content of anatomical structures and the relationship between them.
Our solution combines the Swin Transformer and U-Net architectures, producing a
U-shaped variant. The validation of the proposed method is performed with the
EchoNet-Dynamic dataset used to train our model. The results show an accuracy
of 0.97, a Dice coefficient of 0.87, and an Intersection over union (IoU) of 0.78.
Swin Transformer models are promising for semantically segmenting echocardiographic
images and may help assist cardiologists in automatically analyzing and measuring
complex echocardiographic images.
What problem does this paper attempt to address?