D-TrAttUnet: Dual-Decoder Transformer-Based Attention Unet Architecture for Binary and Multi-classes Covid-19 Infection Segmentation

Fares Bougourzi,Cosimo Distante,Fadi Dornaika,Abdelmalik Taleb-Ahmed
2023-03-28
Abstract:In the last three years, the world has been facing a global crisis caused by Covid-19 pandemic. Medical imaging has been playing a crucial role in the fighting against this disease and saving the human lives. Indeed, CT-scans has proved their efficiency in diagnosing, detecting, and following-up the Covid-19 infection. In this paper, we propose a new Transformer-CNN based approach for Covid-19 infection segmentation from the CT slices. The proposed D-TrAttUnet architecture has an Encoder-Decoder structure, where compound Transformer-CNN encoder and Dual-Decoders are proposed. The Transformer-CNN encoder is built using Transformer layers, UpResBlocks, ResBlocks and max-pooling layers. The Dual-Decoder consists of two identical CNN decoders with attention gates. The two decoders are used to segment the infection and the lung regions simultaneously and the losses of the two tasks are joined. The proposed D-TrAttUnet architecture is evaluated for both Binary and Multi-classes Covid-19 infection segmentation. The experimental results prove the efficiency of the proposed approach to deal with the complexity of Covid-19 segmentation task from limited data. Furthermore, D-TrAttUnet architecture outperforms three baseline CNN segmentation architectures (Unet, AttUnet and Unet++) and three state-of-the-art architectures (AnamNet, SCOATNet and CopleNet), in both Binary and Mutli-classes segmentation tasks.
Image and Video Processing,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper attempts to address the problem of binary classification (infected or non-infected) and multi-class classification (non-infected, ground-glass opacity GGO, or consolidation) segmentation of COVID-19 infections in CT images. Specifically, the authors propose a novel Transformer-CNN-based method—the D-TrAttUnet architecture, aimed at overcoming the challenges faced by existing methods when dealing with limited datasets and the difficulties encountered in segmenting highly variable COVID-19 infection regions. These issues include the high variability in the shape, size, and location of infection regions, as well as the complexity introduced by different infection stages (early and late), symptoms (asymptomatic and symptomatic patients), and severity levels. D-TrAttUnet extracts local contextual information, long-range dependencies, and global contextual information by combining the advantages of Transformer and CNN, particularly in the encoding stage. To guide the model to focus on the interior of the lungs and exclude non-lung tissues, a dual-decoder structure is proposed to simultaneously segment the infection regions and lung regions. This method has demonstrated its effectiveness in experiments and outperformed various baseline models and state-of-the-art architectures in both binary and multi-class segmentation tasks.