ISImed: A Framework for Self-Supervised Learning using Intrinsic Spatial Information in Medical Images

Nabil Jabareen,Dongsheng Yuan,Sören Lukassen
2024-10-22
Abstract:This paper demonstrates that spatial information can be used to learn interpretable representations in medical images using Self-Supervised Learning (SSL). Our proposed method, ISImed, is based on the observation that medical images exhibit a much lower variability among different images compared to classic data vision benchmarks. By leveraging this resemblance of human body structures across multiple images, we establish a self-supervised objective that creates a latent representation capable of capturing its location in the physical realm. More specifically, our method involves sampling image crops and creating a distance matrix that compares the learned representation vectors of all possible combinations of these crops to the true distance between them. The intuition is, that the learned latent space is a positional encoding for a given image crop. We hypothesize, that by learning these positional encodings, comprehensive image representations have to be generated. To test this hypothesis and evaluate our method, we compare our learned representation with two state-of-the-art SSL benchmarking methods on two publicly available medical imaging datasets. We show that our method can efficiently learn representations that capture the underlying structure of the data and can be used to transfer to a downstream classification task.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to use the inherent spatial information in medical images for self - supervised learning (SSL) to generate interpretable image representations and improve the performance of downstream classification tasks. ### Problem Background When applying deep neural networks (DNNs) in the medical field, a major challenge is the need for a large amount of labeled data to train supervised methods. Labeling medical images usually requires the knowledge of domain experts, and many medical imaging modalities are three - dimensional, resulting in a time - consuming, error - prone and costly labeling process. Self - supervised learning (SSL), as a rapidly developing field of machine learning, can alleviate these problems by extracting useful features and representations from unlabeled data. ### The Method Proposed in the Paper The authors proposed the ISImed framework, aiming to solve the problem in the following ways: 1. **Utilizing Spatial Information**: It is observed that medical images vary little between different samples, and the human body structure shows similarity in multiple images. Based on this, ISImed uses the physical distance between image patches as a learning signal to create a latent representation that can capture their positions in physical space. 2. **Loss Function Design**: Specifically, ISImed randomly samples image patches and calculates the true physical distance matrix \(D_{\text{physical}}\) between these patches and the distance matrix \(D_{\text{latent}}\) between the learned latent representation vectors, and defines a simple L2 loss function: \[ \text{Loss} = L_2(D_{\text{physical}}, D_{\text{latent}}) \] This loss function ensures that the learned latent representation matches the true physical positions of the image patches. 3. **Preventing Information Collapse**: To prevent information collapse in the latent representation (that is, most dimensions become uninformative), the authors combined the BarlowTwins method for regularization. BarlowTwins reduces redundancy by maximizing the mutual information between different views, thereby preventing the latent representation from collapsing into a constant. 4. **Experimental Verification**: The authors conducted experiments on two publicly available medical image datasets (autoPET and BraTS) to verify the effectiveness of ISImed. The results show that ISImed significantly outperforms other self - supervised learning methods in downstream classification tasks. ### Main Contributions - Proposed a new framework ISImed for self - supervised learning using the inherent spatial information in medical images. - Demonstrated that by learning the spatial position encoding of image patches, a comprehensive image representation can be generated. - Verified the effectiveness of the method on two different medical image datasets and showed its superior performance in downstream classification tasks. In conclusion, this paper solves the key problems in self - supervised learning of medical images by introducing the ISImed framework, providing new ideas for improving the efficiency and accuracy of medical image analysis.