TBConvL-Net: A Hybrid Deep Learning Architecture for Robust Medical Image Segmentation

Shahzaib Iqbal,Tariq M. Khan,Syed S. Naqvi,Asim Naveed,Erik Meijering
2024-09-05
Abstract:Deep learning has shown great potential for automated medical image segmentation to improve the precision and speed of disease diagnostics. However, the task presents significant difficulties due to variations in the scale, shape, texture, and contrast of the pathologies. Traditional convolutional neural network (CNN) models have certain limitations when it comes to effectively modelling multiscale context information and facilitating information interaction between skip connections across levels. To overcome these limitations, a novel deep learning architecture is introduced for medical image segmentation, taking advantage of CNNs and vision transformers. Our proposed model, named TBConvL-Net, involves a hybrid network that combines the local features of a CNN encoder-decoder architecture with long-range and temporal dependencies using biconvolutional long-short-term memory (LSTM) networks and vision transformers (ViT). This enables the model to capture contextual channel relationships in the data and account for the uncertainty of segmentation over time. Additionally, we introduce a novel composite loss function that considers both the segmentation robustness and the boundary agreement of the predicted output with the gold standard. Our proposed model shows consistent improvement over the state of the art on ten publicly available datasets of seven different medical imaging modalities.
Image and Video Processing,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in medical image segmentation tasks, traditional Convolutional Neural Network (CNN) models have limitations in handling multi - scale context information and promoting information interaction between cross - level skip connections. Specifically, the paper points out: 1. **Balance between local and global features**: Traditional CNN models mainly focus on local features and overlook the importance of global features. For the changes in lesion shapes and sizes in medical images, global features are necessary for reliable segmentation. 2. **Fixity of convolution kernels**: The trained convolution kernels cannot be adjusted according to the content of the input image, which makes the network less adaptable to different input features. 3. **Capturing long - distance dependencies**: The local operations of CNN limit its ability to capture long - distance dependencies, resulting in potentially unsatisfactory segmentation results. 4. **Fusion of spatio - temporal information**: Existing methods have difficulty in effectively fusing spatio - temporal information when dealing with time - series data, especially when it is necessary to consider the change of segmentation uncertainty over time. To solve these problems, the paper proposes a new deep - learning architecture - TBConvL - Net, which combines the advantages of CNN and Vision Transformer (ViT) and introduces Bidirectional Convolutional Long - Short - Term Memory Network (BConvLSTM) to capture spatio - temporal dependencies. In this way, TBConvL - Net can better handle multi - scale context information and improve the robustness and accuracy of medical image segmentation.