Can representation learning for multimodal image registration be improved by supervision of intermediate layers?

Elisabeth Wetzer,Joakim Lindblad,Nataša Sladoje

2023-03-01

Abstract:Multimodal imaging and correlative analysis typically require image alignment. Contrastive learning can generate representations of multimodal images, reducing the challenging task of multimodal image registration to a monomodal one. Previously, additional supervision on intermediate layers in contrastive learning has improved biomedical image classification. We evaluate if a similar approach improves representations learned for registration to boost registration performance. We explore three approaches to add contrastive supervision to the latent features of the bottleneck layer in the U-Nets encoding the multimodal images and evaluate three different critic functions. Our results show that representations learned without additional supervision on latent features perform best in the downstream task of registration on two public biomedical datasets. We investigate the performance drop by exploiting recent insights in contrastive learning in classification and self-supervised learning. We visualize the spatial relations of the learned representations by means of multidimensional scaling, and show that additional supervision on the bottleneck layer can lead to partial dimensional collapse of the intermediate embedding space.

Computer Vision and Pattern Recognition,Machine Learning

What problem does this paper attempt to address?

### The Problem the Paper Attempts to Solve This paper aims to explore whether supervising the representation learning of intermediate layers (especially the bottleneck layer) in multimodal image registration can improve performance. Specifically, the authors evaluate methods for providing additional supervision to the bottleneck layer features of U-Net within a contrastive learning framework to generate better multimodal image representations (CoMIRs), thereby enhancing registration performance. ### Background and Motivation Multimodal imaging techniques can capture complementary information about samples, which is significant in digital pathology. However, the appearance of images generated by different sensors varies greatly, making automatic multimodal image registration very challenging. Traditional manual registration methods are not only time-consuming and labor-intensive but also costly. Therefore, reliable automated multimodal image registration methods are crucial for both research and clinical applications. Recent studies have shown that providing additional supervision to intermediate layers in contrastive learning can improve representation learning in biomedical image classification tasks. Based on this finding, the authors investigate whether similar methods can be applied to multimodal image registration tasks to further enhance the quality of CoMIRs. ### Methods and Experiments The authors propose three methods to add contrastive loss to the bottleneck layer features of U-Net: 1. **Alternating Loss**: Alternately compute the contrastive loss of the final output layer and the bottleneck layer in each iteration. 2. **Weighted Loss**: Simultaneously compute the contrastive loss of the final output layer and the bottleneck layer in each iteration, weighted by a hyperparameter. 3. **Pre-training**: Pre-train the bottleneck layer for 50 epochs, then train the final output layer for another 50 epochs. The authors conducted experiments on two public biomedical datasets, SHG & BF dataset and QPI & FM dataset. Evaluation metrics included registration success rate (RSR) and various image similarity/distance measures. ### Results and Discussion The experimental results show that the baseline method without additional supervision achieved the best registration performance on both datasets. Specifically: - On the SHG & BF dataset, the weighted loss method using L1 norm as the similarity function performed the best but still did not surpass the baseline method. - On the QPI & FM dataset, the pre-training method outperformed the alternating loss method but still fell short of the baseline method. Further analysis revealed that additional supervision of the bottleneck layer might lead to the collapse of some dimensions in the feature space, thereby affecting registration performance. Moreover, visualizing the feature embedding space through multidimensional scaling (MDS) showed that additional supervision caused features to cluster by modality rather than cross-modality similarity. ### Conclusion This study indicates that for multimodal image registration tasks, the CoMIRs generation method without additional supervision performs best in downstream tasks. This contrasts with previous observations in biomedical image classification tasks, suggesting that different tasks have different requirements for representation learning. Future work can further explore how to optimize representation learning methods for multimodal image registration.

Can representation learning for multimodal image registration be improved by supervision of intermediate layers?

Multimodal Medical Image Registration Via Common Representations Learning and Differentiable Geometric Constraints

End-to-end multimodal image registration via reinforcement learning

Contrastive Learning of Multimodal Consistency Feature Representation for Remote Sensing Image Registration

Weakly-supervised convolutional neural networks for multimodal image registration

Learning a Metric for Multimodal Medical Image Registration without Supervision Based on Cycle Constraints

CoMIR: Contrastive Multimodal Image Representation for Registration

Multi-modal representation learning in retinal imaging using self-supervised learning for enhanced clinical predictions

Best of Both Worlds: Multimodal Contrastive Learning with Tabular and Imaging Data

On the duality between contrastive and non-contrastive self-supervised learning

Unsupervised Multimodal 3D Medical Image Registration with Multilevel Correlation Balanced Optimization

What to align in multimodal contrastive learning?

Multimodal Contrastive Training for Visual Representation Learning

Unsupervised Multimodal Image Registration with Adaptative Gradient Guidance

Learning Semi-Supervised Medical Image Segmentation from Spatial Registration

Contrastive Deep Supervision

Unsupervised learning of multimodal image registration using domain adaptation with projected Earth Move's discrepancies

Multimodal Supervised Contrastive Learning in Remote Sensing Downstream Tasks

Joint Self-Supervised and Supervised Contrastive Learning for Multimodal MRI Data: Towards Predicting Abnormal Neurodevelopment

Spatial-aware contrastive learning for cross-domain medical image registration