Abstract:Masked Image Modelling (MIM), a form of self-supervised learning, has garnered significant success in computer vision by improving image representations using unannotated data. Traditional MIMs typically employ a strategy of random sampling across the image. However, this random masking technique may not be ideally suited for medical imaging, which possesses distinct characteristics divergent from natural images. In medical imaging, particularly in pathology, disease-related features are often exceedingly sparse and localized, while the remaining regions appear normal and undifferentiated. Additionally, medical images frequently accompany reports, directly pinpointing pathological changes' location. Inspired by this, we propose Masked medical Image Modelling (MedIM), a novel approach, to our knowledge, the first research that employs radiological reports to guide the masking and restore the informative areas of images, encouraging the network to explore the stronger semantic representations from medical images. We introduce two mutual comprehensive masking strategies, knowledge-driven masking (KDM), and sentence-driven masking (SDM). KDM uses Medical Subject Headings (MeSH) words unique to radiology reports to identify symptom clues mapped to MeSH words (e.g., cardiac, edema, vascular, pulmonary) and guide the mask generation. Recognizing that radiological reports often comprise several sentences detailing varied findings, SDM integrates sentence-level information to identify key regions for masking. MedIM reconstructs images informed by this masking from the KDM and SDM modules, promoting a comprehensive and enriched medical image representation. Our extensive experiments on seven downstream tasks covering multi-label/class image classification, pneumothorax segmentation, and medical image-report analysis, demonstrate that MedIM with report-guided masking achieves competitive performance. Our method substantially outperforms ImageNet pre-training, MIM-based pre-training, and medical image-report pre-training counterparts. Codes are available at https://github.com/YtongXie/MedIM.

Kernel Masked Image Modeling Through the Lens of Theoretical Understanding

PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling

SimMIM: A Simple Framework for Masked Image Modeling

Symmetric masking strategy enhances the performance of Masked Image Modeling

Understanding Masked Image Modeling via Learning Occlusion Invariant Feature

BIM: Block-Wise Self-Supervised Learning with Masked Image Modeling

Masked Image Modeling: A Survey

Towards Latent Masked Image Modeling for Self-Supervised Visual Representation Learning

Improving Pixel-based MIM by Reducing Wasted Modeling Capability

Disjoint Masking with Joint Distillation for Efficient Masked Image Modeling

Rethinking masked image modelling for medical image representation

Membership Inference Attack Against Masked Image Modeling

Masked Image Modeling with Local Multi-Scale Reconstruction.

On the Role of Discrete Tokenization in Visual Representation Learning

Masked Image Modeling Advances 3D Medical Image Analysis

Learning with Unmasked Tokens Drives Stronger Vision Learners

SG-MIM: Structured Knowledge Guided Efficient Pre-training for Dense Prediction

Pre-training with Random Orthogonal Projection Image Modeling

Beyond [cls]: Exploring the true potential of Masked Image Modeling representations

Remote Sensing Scene Classification with Masked Image Modeling (MIM)

MedIM: Boost Medical Image Representation via Radiology Report-Guided Masking