MDViT: Multi-domain Vision Transformer for Small Medical Image Segmentation Datasets

Siyi Du,Nourhan Bayasi,Ghassan Hamarneh,Rafeef Garbi

2024-06-07

Abstract:Despite its clinical utility, medical image segmentation (MIS) remains a daunting task due to images' inherent complexity and variability. Vision transformers (ViTs) have recently emerged as a promising solution to improve MIS; however, they require larger training datasets than convolutional neural networks. To overcome this obstacle, data-efficient ViTs were proposed, but they are typically trained using a single source of data, which overlooks the valuable knowledge that could be leveraged from other available datasets. Naivly combining datasets from different domains can result in negative knowledge transfer (NKT), i.e., a decrease in model performance on some domains with non-negligible inter-domain heterogeneity. In this paper, we propose MDViT, the first multi-domain ViT that includes domain adapters to mitigate data-hunger and combat NKT by adaptively exploiting knowledge in multiple small data resources (domains). Further, to enhance representation learning across domains, we integrate a mutual knowledge distillation paradigm that transfers knowledge between a universal network (spanning all the domains) and auxiliary domain-specific branches. Experiments on 4 skin lesion segmentation datasets show that MDViT outperforms state-of-the-art algorithms, with superior segmentation performance and a fixed model size, at inference time, even as more domains are added. Our code is available at <a class="link-external link-https" href="https://github.com/siyi-wind/MDViT" rel="external noopener nofollow">this https URL</a>.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenges faced in training Vision Transformers (ViTs) on small - scale medical image segmentation datasets. Specifically, although ViTs have shown potential in Medical Image Segmentation (MIS), they require more data to train than Convolutional Neural Networks (CNNs), which is often difficult to meet in practical applications, especially when dealing with small - scale datasets. In addition, simply combining datasets from different domains for use may lead to Negative Knowledge Transfer (NKT), that is, the performance of the model in some domains decreases because of the significant heterogeneity of data in different domains. To solve these problems, the authors propose MDViT (Multi - domain Vision Transformer), which is a multi - domain ViT containing Domain Adapters, aiming to alleviate the data requirements and combat NKT by adaptively utilizing knowledge in multiple small - data resources (domains). Moreover, in order to enhance cross - domain representation learning, the authors also integrate a Mutual Knowledge Distillation paradigm, which transfers knowledge between the general network (covering all domains) and auxiliary domain - specific network branches. Through these innovations, MDViT can improve the segmentation performance on multiple skin lesion segmentation datasets while maintaining a fixed model size, even when more domains are added. The experimental results show that MDViT outperforms the existing state - of - the - art algorithms on four skin lesion segmentation datasets. In particular, on the skin cancer detection dataset, compared with Separate Training (ST), the IOU is increased by 10.16%.

MDViT: Multi-domain Vision Transformer for Small Medical Image Segmentation Datasets

SLViT: Scale-Wise Language-Guided Vision Transformer for Referring Image Segmentation.

DDViT: Double-Level Fusion Domain Adapter Vision Transformer (Student Abstract)

MMViT-Seg: A Lightweight Transformer and CNN Fusion Network for COVID-19 Segmentation.

MIL-ViT: A Multiple Instance Vision Transformer for Fundus Image Classification

AViT: Adapting Vision Transformers for Small Skin Lesion Segmentation Datasets

Self-Distilled Vision Transformer for Domain Generalization

MMMViT: Multiscale multimodal vision transformer for brain tumor segmentation with missing modalities

Aromatization of shikimic acid in the rat and the role of gastrointestinal micro-organisms.

A Simple and Robust Framework for Cross-Modality Medical Image Segmentation applied to Vision Transformers

Implementing vision transformer for classifying 2D biomedical images

Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation

UNetFormer: A Unified Vision Transformer Model and Pre-Training Framework for 3D Medical Image Segmentation

HViT: Hybrid vision inspired transformer for the assessment of carotid artery plaque by addressing the cross-modality domain adaptation problem in MRI

LViT: Language meets Vision Transformer in Medical Image Segmentation

ViT-V-Net: Vision Transformer for Unsupervised Volumetric Medical Image Registration

Enteric hyperoxaluria: an important cause of end-stage kidney disease.

A Recent Survey of Vision Transformers for Medical Image Segmentation

MPViT: Multi-Path Vision Transformer for Dense Prediction

MedViT: A robust vision transformer for generalized medical image classification

HSViT: Horizontally Scalable Vision Transformer