Abstract:This paper demonstrates that spatial information can be used to learn interpretable representations in medical images using Self-Supervised Learning (SSL). Our proposed method, ISImed, is based on the observation that medical images exhibit a much lower variability among different images compared to classic data vision benchmarks. By leveraging this resemblance of human body structures across multiple images, we establish a self-supervised objective that creates a latent representation capable of capturing its location in the physical realm. More specifically, our method involves sampling image crops and creating a distance matrix that compares the learned representation vectors of all possible combinations of these crops to the true distance between them. The intuition is, that the learned latent space is a positional encoding for a given image crop. We hypothesize, that by learning these positional encodings, comprehensive image representations have to be generated. To test this hypothesis and evaluate our method, we compare our learned representation with two state-of-the-art SSL benchmarking methods on two publicly available medical imaging datasets. We show that our method can efficiently learn representations that capture the underlying structure of the data and can be used to transfer to a downstream classification task.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to use the inherent spatial information in medical images for self - supervised learning (SSL) to generate interpretable image representations and improve the performance of downstream classification tasks. ### Problem Background When applying deep neural networks (DNNs) in the medical field, a major challenge is the need for a large amount of labeled data to train supervised methods. Labeling medical images usually requires the knowledge of domain experts, and many medical imaging modalities are three - dimensional, resulting in a time - consuming, error - prone and costly labeling process. Self - supervised learning (SSL), as a rapidly developing field of machine learning, can alleviate these problems by extracting useful features and representations from unlabeled data. ### The Method Proposed in the Paper The authors proposed the ISImed framework, aiming to solve the problem in the following ways: 1. **Utilizing Spatial Information**: It is observed that medical images vary little between different samples, and the human body structure shows similarity in multiple images. Based on this, ISImed uses the physical distance between image patches as a learning signal to create a latent representation that can capture their positions in physical space. 2. **Loss Function Design**: Specifically, ISImed randomly samples image patches and calculates the true physical distance matrix \(D_{\text{physical}}\) between these patches and the distance matrix \(D_{\text{latent}}\) between the learned latent representation vectors, and defines a simple L2 loss function: \[ \text{Loss} = L_2(D_{\text{physical}}, D_{\text{latent}}) \] This loss function ensures that the learned latent representation matches the true physical positions of the image patches. 3. **Preventing Information Collapse**: To prevent information collapse in the latent representation (that is, most dimensions become uninformative), the authors combined the BarlowTwins method for regularization. BarlowTwins reduces redundancy by maximizing the mutual information between different views, thereby preventing the latent representation from collapsing into a constant. 4. **Experimental Verification**: The authors conducted experiments on two publicly available medical image datasets (autoPET and BraTS) to verify the effectiveness of ISImed. The results show that ISImed significantly outperforms other self - supervised learning methods in downstream classification tasks. ### Main Contributions - Proposed a new framework ISImed for self - supervised learning using the inherent spatial information in medical images. - Demonstrated that by learning the spatial position encoding of image patches, a comprehensive image representation can be generated. - Verified the effectiveness of the method on two different medical image datasets and showed its superior performance in downstream classification tasks. In conclusion, this paper solves the key problems in self - supervised learning of medical images by introducing the ISImed framework, providing new ideas for improving the efficiency and accuracy of medical image analysis.

ISImed: A Framework for Self-Supervised Learning using Intrinsic Spatial Information in Medical Images

Anatomical Invariance Modeling and Semantic Alignment for Self-supervised Learning in 3D Medical Image Analysis

Self-Supervised Learning for Non-Rigid Registration Between Near-Isometric 3D Surfaces in Medical Imaging.

Learning Where to Learn in Cross-View Self-Supervised Learning

OPTiML: Dense Semantic Invariance Using Optimal Transport for Self-Supervised Medical Image Representation

Self-Supervised Alignment Learning for Medical Image Segmentation

Self-Supervised Learning for Endoscopic Video Analysis

Examining the quality of learned representations in self-supervised medical image analysis: a comprehensive review and empirical study

Dive into the Details of Self-Supervised Learning for Medical Image Analysis.

3D Self-Supervised Methods for Medical Imaging

Towards Foundation Models Learned from Anatomy in Medical Imaging via Self-Supervision

Cross-Dimensional Medical Self-Supervised Representation Learning Based on a Pseudo-3D Transformation

UniMiSS+: Universal Medical Self-Supervised Learning From Cross-Dimensional Unpaired Data

Exploring Self-Supervised Representation Learning For Low-Resource Medical Image Analysis

Contrastive Self-Supervised Learning for Spatio-Temporal Analysis of Lung Ultrasound Videos

SSL-CPCD: Self-supervised learning with composite pretext-class discrimination for improved generalisability in endoscopic image analysis

Spatio-Temporal Structure Consistency for Semi-supervised Medical Image Classification

CAiD: Context-Aware Instance Discrimination for Self-supervised Learning in Medical Imaging

Self-supervised learning for interventional image analytics: toward robust device trackers

Interpretable Saliency Maps And Self-Supervised Learning For Generalized Zero Shot Medical Image Classification

SSLP: Spatial Guided Self-supervised Learning on Pathological Images