Interpretable Matching of Optical-SAR Image Via Dynamically Conditioned Diffusion Models

Shuiping Gou,Xin Wang,Xinlin Wang,Yunzhi Chen
DOI: https://doi.org/10.1145/3664647.3681700
2024-01-01
Abstract:Driven by the complementary information fusion of optical and synthetic aperture radar (SAR) images, the optical-SAR image matching has drawn much attention. However, the significant radiometric differences between them imposes great challenges on accurate matching. Most existing approaches convert SAR and optical images into a shared feature space to perform the matching, but these methods often fail to achieve the robust matching since the feature spaces are unknown and uninterpretable. Motivated by the interpretable latent space of diffusion models, this paper formulates an optical-SAR image translation and matching framework via a dynamically conditioned diffusion model (DCDM) to achieve the interpretable and robust optical-SAR cross-modal image matching. Specifically, in the denoising process, to filter out outlier matching regions, a gated dynamic sparse cross-attention module is proposed to facilitate efficient and effective long-range interactions of multi-grained features between the cross-modal data. In addition, a spatial position consistency constraint is designed to promote the cross-attention features to perceive the spatial corresponding relation in different modalities, improving the matching precision. Experimental results demonstrate that the proposed method outperforms state-of-the-art methods in terms of both the matching accuracy and the interpretability.
What problem does this paper attempt to address?