Transfer Learning for Microstructure Segmentation with CS-UNet: A Hybrid Algorithm with Transformer and CNN Encoders

Khaled Alrfou,Tian Zhao,Amir Kordijazi

2023-08-27

Abstract:Transfer learning improves the performance of deep learning models by initializing them with parameters pre-trained on larger datasets. Intuitively, transfer learning is more effective when pre-training is on the in-domain datasets. A recent study by NASA has demonstrated that the microstructure segmentation with encoder-decoder algorithms benefits more from CNN encoders pre-trained on microscopy images than from those pre-trained on natural images. However, CNN models only capture the local spatial relations in images. In recent years, attention networks such as Transformers are increasingly used in image analysis to capture the long-range relations between pixels. In this study, we compare the segmentation performance of Transformer and CNN models pre-trained on microscopy images with those pre-trained on natural images. Our result partially confirms the NASA study that the segmentation performance of out-of-distribution images (taken under different imaging and sample conditions) is significantly improved when pre-training on microscopy images. However, the performance gain for one-shot and few-shot learning is more modest with Transformers. We also find that for image segmentation, the combination of pre-trained Transformers and CNN encoders are consistently better than pre-trained CNN encoders alone. Our dataset (of about 50,000 images) combines the public portion of the NASA dataset with additional images we collected. Even with much less training data, our pre-trained models have significantly better performance for image segmentation. This result suggests that Transformers and CNN complement each other and when pre-trained on microscopy images, they are more beneficial to the downstream tasks.

Computer Vision and Pattern Recognition,Materials Science

What problem does this paper attempt to address?

The paper attempts to address the problem of how to utilize transfer learning to improve model performance in the task of microscopic structure segmentation. Specifically, the researchers explore the effectiveness of combining Convolutional Neural Networks (CNN) and Transformers, particularly when using microscopic image datasets during the pre-training phase. By comparing the performance of CNN and Transformer models pre-trained on natural images and microscopic images in the microscopic structure segmentation task, the paper evaluates the impact of different pre-training strategies on model performance. Additionally, the paper proposes a new hybrid algorithm—CS-UNet, which combines the advantages of CNN and Transformer, aiming to capture both local features and long-range dependencies of images to achieve better segmentation results. The main contributions of the paper include: 1. **Evaluation of different pre-training strategies**: The researchers compared the performance of models pre-trained on natural images and microscopic images in the microscopic structure segmentation task, finding that pre-training with microscopic images can significantly improve the model's segmentation performance on images with different distributions. 2. **Proposing the CS-UNet model**: This model combines the strengths of CNN and Transformer by using these two encoders in parallel to extract rich feature information, and fusing these features into the decoder through skip connections, thereby improving the model's segmentation accuracy. 3. **Experimental validation**: Through experiments on multiple microscopic image datasets, the effectiveness of the CS-UNet model was validated, especially in few-shot learning and cross-distribution image segmentation tasks. Overall, the paper provides an effective solution for the task of microscopic structure segmentation by systematically evaluating different pre-training strategies and model architectures, and also offers valuable references for future related research.

Transfer Learning for Microstructure Segmentation with CS-UNet: A Hybrid Algorithm with Transformer and CNN Encoders

CS-UNet: A generalizable and flexible segmentation algorithm

Mixed Transformer U-Net for Medical Image Segmentation

TF-Unet:An Automatic Cardiac MRI Image Segmentation Method

MSCT-UNET: multi-scale contrastive transformer within U-shaped network for medical image segmentation

TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

EG-TransUNet: a transformer-based U-Net with enhanced and guided models for biomedical image segmentation

UNETR: Transformers for 3D Medical Image Segmentation

3D TransUNet: Advancing Medical Image Segmentation through Vision Transformers

UNetFormer: A Unified Vision Transformer Model and Pre-Training Framework for 3D Medical Image Segmentation

TransNorm: Transformer Provides a Strong Spatial Normalization Mechanism for a Deep Segmentation Model

ETUNet:Exploring efficient transformer enhanced UNet for 3D brain tumor segmentation

Going Beyond U-Net: Assessing Vision Transformers for Semantic Segmentation in Microscopy Image Analysis

A Novel Deep Learning Model for Medical Image Segmentation with Convolutional Neural Network and Transformer

3D-TransUNet for Brain Metastases Segmentation in the BraTS2023 Challenge

3D Medical image segmentation using parallel transformers

TSCA-Net: Transformer based spatial-channel attention segmentation network for medical images

Sfe-Transunet: A Transformer-Based U-Net With Skipped Features Enhancer For Medical Image Segmentation

CSWin-UNet: Transformer UNet with Cross-Shaped Windows for Medical Image Segmentation

TransSea: Hybrid CNN-Transformer with Semantic Awareness for 3D Brain Tumor Segmentation

CiT-Net: Convolutional Neural Networks Hand in Hand with Vision Transformers for Medical Image Segmentation