MedMAE: A Self-Supervised Backbone for Medical Imaging Tasks

Anubhav Gupta,Islam Osman,Mohamed S. Shehata,John W. Braun
2024-07-20
Abstract:Medical imaging tasks are very challenging due to the lack of publicly available labeled datasets. Hence, it is difficult to achieve high performance with existing deep-learning models as they require a massive labeled dataset to be trained effectively. An alternative solution is to use pre-trained models and fine-tune them using the medical imaging dataset. However, all existing models are pre-trained using natural images, which is a completely different domain from that of medical imaging, which leads to poor performance due to domain shift. To overcome these problems, we propose a large-scale unlabeled dataset of medical images and a backbone pre-trained using the proposed dataset with a self-supervised learning technique called Masked autoencoder. This backbone can be used as a pre-trained model for any medical imaging task, as it is trained to learn a visual representation of different types of medical images. To evaluate the performance of the proposed backbone, we used four different medical imaging tasks. The results are compared with existing pre-trained models. These experiments show the superiority of our proposed backbone in medical imaging tasks.
Image and Video Processing,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address the challenges in medical imaging tasks due to the lack of publicly annotated datasets. Specifically, existing deep learning models require a large amount of annotated data for effective training, which is very scarce in the field of medical imaging. Therefore, these models often perform unsatisfactorily in medical imaging tasks. To solve this problem, the paper proposes the following points: 1. **Large-scale Unannotated Medical Imaging Dataset**: A large-scale unannotated dataset containing various medical imaging modalities (such as MRI, CT, X-ray, etc.) is constructed. This dataset covers images of multiple body parts and has high diversity and extensiveness. 2. **Self-supervised Pre-training Model**: A ViT architecture pre-training model based on Masked Autoencoder (MAE) technology, called MedMAE, is proposed. Through self-supervised learning methods, the model is pre-trained using unannotated medical images to learn representations of different types of medical images. 3. **Multi-task Adaptability**: This pre-training model can be applied to various medical imaging tasks, such as classification and segmentation. By fine-tuning the pre-trained model, good performance can be achieved in different downstream tasks. Through the above methods, the paper aims to develop a general model that can perform excellently in medical imaging tasks, even with limited annotated data, achieving high accuracy. Experimental results show that MedMAE outperforms existing pre-training models in multiple medical imaging tasks, especially in automated quality control, breast cancer prediction, and pneumonia detection.