A Semi-Supervised Image Registration Framework Based on Multimodal Cross-Attention

Ming Zhao,Jingyi Liu,Yan Wu
DOI: https://doi.org/10.1109/lgrs.2024.3397993
IF: 5.343
2024-05-22
IEEE Geoscience and Remote Sensing Letters
Abstract:Registration of multimodal image pairs is a fundamental task in many remote sensing applications. To achieve accurate and low-cost remote sensing image registration, we propose a semi-supervised image registration framework based on multimodal cross-attention, which consists of the encoder for feature extraction, multimodal cross-attention module, and detection/descriptor decoders. We adopt positional encoding for feature maps to enhance the features with spatial contexts, especially for remote sensing images with large geometrical deformations. To learn common features that are independent of modalities between multimodal images, we proposed a multimodal cross-attention module to extract cross-modal features, which helps the detectors to extract more reliable matching keypoints. The network is trained in a semi-supervised manner, which requires only a small dataset of incompletely labeled images. To learn reliable keypoints from image pairs with inconsistent intensity and geometrical deformations, we randomly establish different geometrical mappings for the multimodal image pairs during training and then enrich the keypoint labels by continuously adding reliable keypoints extracted by the detection decoder in each epoch. Experimental results show that the proposed method achieves more comprehensive and accurate registration than the state-of-the-art methods for multimodal remote sensing images.
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics
What problem does this paper attempt to address?