Abstract:Self-supervised learning approaches leverage unlabeled samples to acquire generic knowledge about different concepts, hence allowing for annotation-efficient downstream task learning. In this paper, we propose a novel self-supervised method that leverages multiple imaging modalities. We introduce the multimodal puzzle task, which facilitates rich representation learning from multiple image modalities. The learned representations allow for subsequent fine-tuning on different downstream tasks. To achieve that, we learn a modality-agnostic feature embedding by confusing image modalities at the data-level. Together with the Sinkhorn operator, with which we formulate the puzzle solving optimization as permutation matrix inference instead of classification, they allow for efficient solving of multimodal puzzles with varying levels of complexity. In addition, we also propose to utilize cross-modal generation techniques for multimodal data augmentation used for training self-supervised tasks. In other words, we exploit synthetic images for self-supervised pretraining, instead of downstream tasks directly, in order to circumvent quality issues associated with synthetic images, while improving data-efficiency and representations quality. Our experimental results, which assess the gains in downstream performance and data-efficiency, show that solving our multimodal puzzles yields better semantic representations, compared to treating each modality independently. Our results also highlight the benefits of exploiting synthetic images for self-supervised pretraining. We showcase our approach on four downstream tasks: Brain tumor segmentation and survival days prediction using four MRI modalities, Prostate segmentation using two MRI modalities, and Liver segmentation using unregistered CT and MRI modalities. We outperform many previous solutions, and achieve results competitive to state-of-the-art.

3D Self-Supervised Methods for Medical Imaging

Self-Supervised Learning for 3D Medical Image Analysis using 3D SimCLR and Monte Carlo Dropout

Self-supervised Feature Learning for 3D Medical Images by Playing a Rubik's Cube

C V ] 5 O ct 2 01 9 Self-supervised Feature Learning for 3 D Medical Images by Playing a Rubik ’ s Cube

Self-supervised learning via inter-modal reconstruction and feature projection networks for label-efficient 3D-to-2D segmentation

Multimodal Self-Supervised Learning for Medical Image Analysis

Leveraging Unlabeled Data for 3D Medical Image Segmentation through Self-Supervised Contrastive Learning

Multiview Long-Short Spatial Contrastive Learning For 3D Medical Image Analysis

GMIM: Self-supervised pre-training for 3D medical image segmentation with adaptive and hierarchical masked image modeling

Autoregressive Sequence Modeling for 3D Medical Image Representation

Self-Supervised Learning for Non-Rigid Registration Between Near-Isometric 3D Surfaces in Medical Imaging.

Unsupervised Segmentation of 3D Medical Images Based on Clustering and Deep Representation Learning

Big Self-Supervised Models Advance Medical Image Classification

T3D: Towards 3D Medical Image Understanding through Vision-Language Pre-training

Revisiting MAE pre-training for 3D medical image segmentation

3D Deep Learning on Medical Images: A Review

Positional Information is a Strong Supervision for Volumetric Medical Image Segmentation

PGL: Prior-Guided Local Self-supervised Learning for 3D Medical Image Segmentation

Time-to-Event Pretraining for 3D Medical Imaging

2.75D: Boosting learning by representing 3D Medical imaging to 2D features for small data

Contrastive self-supervised learning from 100 million medical images with optional supervision