Transfer Learning for Microstructure Segmentation with CS-UNet: A Hybrid Algorithm with Transformer and CNN Encoders

Khaled Alrfou,Tian Zhao,Amir Kordijazi
2023-08-27
Abstract:Transfer learning improves the performance of deep learning models by initializing them with parameters pre-trained on larger datasets. Intuitively, transfer learning is more effective when pre-training is on the in-domain datasets. A recent study by NASA has demonstrated that the microstructure segmentation with encoder-decoder algorithms benefits more from CNN encoders pre-trained on microscopy images than from those pre-trained on natural images. However, CNN models only capture the local spatial relations in images. In recent years, attention networks such as Transformers are increasingly used in image analysis to capture the long-range relations between pixels. In this study, we compare the segmentation performance of Transformer and CNN models pre-trained on microscopy images with those pre-trained on natural images. Our result partially confirms the NASA study that the segmentation performance of out-of-distribution images (taken under different imaging and sample conditions) is significantly improved when pre-training on microscopy images. However, the performance gain for one-shot and few-shot learning is more modest with Transformers. We also find that for image segmentation, the combination of pre-trained Transformers and CNN encoders are consistently better than pre-trained CNN encoders alone. Our dataset (of about 50,000 images) combines the public portion of the NASA dataset with additional images we collected. Even with much less training data, our pre-trained models have significantly better performance for image segmentation. This result suggests that Transformers and CNN complement each other and when pre-trained on microscopy images, they are more beneficial to the downstream tasks.
Computer Vision and Pattern Recognition,Materials Science
What problem does this paper attempt to address?
The paper attempts to address the problem of how to utilize transfer learning to improve model performance in the task of microscopic structure segmentation. Specifically, the researchers explore the effectiveness of combining Convolutional Neural Networks (CNN) and Transformers, particularly when using microscopic image datasets during the pre-training phase. By comparing the performance of CNN and Transformer models pre-trained on natural images and microscopic images in the microscopic structure segmentation task, the paper evaluates the impact of different pre-training strategies on model performance. Additionally, the paper proposes a new hybrid algorithm—CS-UNet, which combines the advantages of CNN and Transformer, aiming to capture both local features and long-range dependencies of images to achieve better segmentation results. The main contributions of the paper include: 1. **Evaluation of different pre-training strategies**: The researchers compared the performance of models pre-trained on natural images and microscopic images in the microscopic structure segmentation task, finding that pre-training with microscopic images can significantly improve the model's segmentation performance on images with different distributions. 2. **Proposing the CS-UNet model**: This model combines the strengths of CNN and Transformer by using these two encoders in parallel to extract rich feature information, and fusing these features into the decoder through skip connections, thereby improving the model's segmentation accuracy. 3. **Experimental validation**: Through experiments on multiple microscopic image datasets, the effectiveness of the CS-UNet model was validated, especially in few-shot learning and cross-distribution image segmentation tasks. Overall, the paper provides an effective solution for the task of microscopic structure segmentation by systematically evaluating different pre-training strategies and model architectures, and also offers valuable references for future related research.