CNN-Transformer Rectified Collaborative Learning for Medical Image Segmentation

Lanhu Wu,Miao Zhang,Yongri Piao,Zhenyan Yao,Weibing Sun,Feng Tian,Huchuan Lu

2024-08-28

Abstract:Automatic and precise medical image segmentation (MIS) is of vital importance for clinical diagnosis and analysis. Current MIS methods mainly rely on the convolutional neural network (CNN) or self-attention mechanism (Transformer) for feature modeling. However, CNN-based methods suffer from the inaccurate localization owing to the limited global dependency while Transformer-based methods always present the coarse boundary for the lack of local emphasis. Although some CNN-Transformer hybrid methods are designed to synthesize the complementary local and global information for better performance, the combination of CNN and Transformer introduces numerous parameters and increases the computation cost. To this end, this paper proposes a CNN-Transformer rectified collaborative learning (CTRCL) framework to learn stronger CNN-based and Transformer-based models for MIS tasks via the bi-directional knowledge transfer between them. Specifically, we propose a rectified logit-wise collaborative learning (RLCL) strategy which introduces the ground truth to adaptively select and rectify the wrong regions in student soft labels for accurate knowledge transfer in the logit space. We also propose a class-aware feature-wise collaborative learning (CFCL) strategy to achieve effective knowledge transfer between CNN-based and Transformer-based models in the feature space by granting their intermediate features the similar capability of category perception. Extensive experiments on three popular MIS benchmarks demonstrate that our CTRCL outperforms most state-of-the-art collaborative learning methods under different evaluation metrics.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper aims to address key issues in Medical Image Segmentation (MIS). Current MIS methods mainly rely on Convolutional Neural Networks (CNN) or self-attention mechanisms (Transformers) for feature modeling. However, CNN-based methods struggle with accurate localization due to limited global dependencies, while Transformer-based methods result in coarse boundaries due to a lack of local emphasis. Although some CNN-Transformer hybrid methods attempt to improve performance by combining complementary local and global information, these methods introduce a large number of parameters, increasing computational costs. To overcome these issues, this paper proposes a CNN-Transformer Rectified Collaborative Learning (CTRCL) framework to enhance the performance of CNN-based and Transformer-based models in MIS tasks through bidirectional knowledge transfer. Specifically, the CTRCL framework includes two strategies: 1. **Rectified Log-space Collaborative Learning (RLCL)**: Achieves accurate log-space knowledge transfer by introducing real labels to select and rectify erroneous regions in student soft labels. 2. **Class-aware Feature-space Collaborative Learning (CFCL)**: Achieves effective feature-space knowledge transfer by endowing intermediate features with similar class-aware capabilities. ### Main Contributions - **First Attempt**: The CTRCL framework is the first to adopt a collaborative learning mechanism to learn stronger CNN-based and Transformer-based models through bidirectional knowledge transfer in log-space and feature-space. - **RLCL Strategy**: Proposes a rectified log-space collaborative learning strategy that ensures high-quality student soft labels by introducing real labels to select and rectify erroneous regions in student soft labels. - **CFCL Strategy**: Proposes a class-aware feature-space collaborative learning strategy that achieves effective feature-space knowledge transfer between heterogeneous networks by endowing intermediate features with similar class-aware capabilities. - **Experimental Validation**: Experiments conducted on three popular MIS benchmark datasets demonstrate that the CTRCL framework outperforms other collaborative learning methods across multiple evaluation metrics. Specifically, on the Kvasir-SEG dataset, CTRCL reduces the MAE metric of ResNet-50 and MiT-B2 by 42.93% and 31.23%, respectively. Through these innovations, the CTRCL framework significantly improves the accuracy and robustness of medical image segmentation.

CNN-Transformer Rectified Collaborative Learning for Medical Image Segmentation

MedFCT: A Frequency Domain Joint CNN-Transformer Network for Semi-supervised Medical Image Segmentation

Combinatorial CNN-Transformer Learning with Manifold Constraints for Semi-supervised Medical Image Segmentation

MixFormer: a Mixed CNN-Transformer Backbone for Medical Image Segmentation

Semi-Supervised Convolutional Vision Transformer with Bi-Level Uncertainty Estimation for Medical Image Segmentation

CiT-Net: Convolutional Neural Networks Hand in Hand with Vision Transformers for Medical Image Segmentation

CTransCNN: Combining transformer and CNN in multilabel medical image classification

UCTNet: Uncertainty-guided CNN-Transformer hybrid networks for medical image segmentation

Semi-Supervised Medical Image Segmentation Based on Deep Consistent Collaborative Learning

TC-Net: A joint learning framework based on CNN and vision transformer for multi-lesion medical images segmentation

MMViT-Seg: A Lightweight Transformer and CNN Fusion Network for COVID-19 Segmentation.

MCRformer: Morphological constraint reticular transformer for 3D medical image segmentation

TEC-Net: Vision Transformer Embrace Convolutional Neural Networks for Medical Image Segmentation

TFCNs: A CNN-Transformer Hybrid Network for Medical Image Segmentation

Efficient Combination of CNN and Transformer for Dual-Teacher Uncertainty-guided Semi-supervised Medical Image Segmentation

MSCT-UNET: multi-scale contrastive transformer within U-shaped network for medical image segmentation

Multi‐scale contextual learning for medical image segmentation via dual distillation

BMCS-Net: A Bi-directional multi-scale cascaded segmentation network based on transformer-guided feature Aggregation for medical images

TransCC: Transformer Network for Coronary Artery CCTA Segmentation

MCCSeg: Morphological embedding causal constraint network for medical image segmentation

SCL-Net: Structured Collaborative Learning for PET/CT Based Tumor Segmentation