MedMAE: A Self-Supervised Backbone for Medical Imaging Tasks

Anubhav Gupta,Islam Osman,Mohamed S. Shehata,John W. Braun

2024-07-20

Abstract:Medical imaging tasks are very challenging due to the lack of publicly available labeled datasets. Hence, it is difficult to achieve high performance with existing deep-learning models as they require a massive labeled dataset to be trained effectively. An alternative solution is to use pre-trained models and fine-tune them using the medical imaging dataset. However, all existing models are pre-trained using natural images, which is a completely different domain from that of medical imaging, which leads to poor performance due to domain shift. To overcome these problems, we propose a large-scale unlabeled dataset of medical images and a backbone pre-trained using the proposed dataset with a self-supervised learning technique called Masked autoencoder. This backbone can be used as a pre-trained model for any medical imaging task, as it is trained to learn a visual representation of different types of medical images. To evaluate the performance of the proposed backbone, we used four different medical imaging tasks. The results are compared with existing pre-trained models. These experiments show the superiority of our proposed backbone in medical imaging tasks.

Image and Video Processing,Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper aims to address the challenges in medical imaging tasks due to the lack of publicly annotated datasets. Specifically, existing deep learning models require a large amount of annotated data for effective training, which is very scarce in the field of medical imaging. Therefore, these models often perform unsatisfactorily in medical imaging tasks. To solve this problem, the paper proposes the following points: 1. **Large-scale Unannotated Medical Imaging Dataset**: A large-scale unannotated dataset containing various medical imaging modalities (such as MRI, CT, X-ray, etc.) is constructed. This dataset covers images of multiple body parts and has high diversity and extensiveness. 2. **Self-supervised Pre-training Model**: A ViT architecture pre-training model based on Masked Autoencoder (MAE) technology, called MedMAE, is proposed. Through self-supervised learning methods, the model is pre-trained using unannotated medical images to learn representations of different types of medical images. 3. **Multi-task Adaptability**: This pre-training model can be applied to various medical imaging tasks, such as classification and segmentation. By fine-tuning the pre-trained model, good performance can be achieved in different downstream tasks. Through the above methods, the paper aims to develop a general model that can perform excellently in medical imaging tasks, even with limited annotated data, achieving high accuracy. Experimental results show that MedMAE outperforms existing pre-training models in multiple medical imaging tasks, especially in automated quality control, breast cancer prediction, and pneumonia detection.

MedMAE: A Self-Supervised Backbone for Medical Imaging Tasks

3D Masked Autoencoders with Application to Anomaly Detection in Non-Contrast Enhanced Breast MRI

MiM: Mask in Mask Self-Supervised Pre-Training for 3D Medical Image Analysis

Medical supervised masked autoencoders: Crafting a better masking strategy and efficient fine-tuning schedule for medical image classification

Masked Image Modeling Advances 3D Medical Image Analysis

Deep neural models for automated multi-task diagnostic scan management—quality enhancement, view classification and report generation

Toward High Quality Facial Representation Learning

MedFLIP: Medical Vision-and-Language Self-supervised Fast Pre-Training with Masked Autoencoder

SurgMAE: Masked Autoencoders for Long Surgical Video Analysis

Advancing Volumetric Medical Image Segmentation via Global-Local Masked Autoencoder

Big Self-Supervised Models Advance Medical Image Classification

Representation Recovering for Self-Supervised Pre-training on Medical Images.

Self-supervised pre-training with contrastive and masked autoencoder methods for dealing with small datasets in deep learning for medical imaging

GMIM: Self-supervised pre-training for 3D medical image segmentation with adaptive and hierarchical masked image modeling

Rethinking masked image modelling for medical image representation

LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching

Photoperiodic regulation of the diapause of the progeny in Trichogramma embryophagum Htg. (Hymenoptera, Trichogrammatidae): Dynamics of sensitivity to photoperiod at the immature stages of maternal females

MEDUSA: Multi-scale Encoder-Decoder Self-Attention Deep Neural Network Architecture for Medical Image Analysis

HybridMIM: A Hybrid Masked Image Modeling Framework for 3D Medical Image Segmentation

Revisiting MAE pre-training for 3D medical image segmentation

Enhancing Network Initialization for Medical AI Models Using Large-Scale, Unlabeled Natural Images