Abstract:We propose a novel unsupervised cross-modal homography estimation framework, based on interleaved modality transfer and self-supervised homography prediction, named InterNet. InterNet integrates modality transfer and self-supervised homography estimation, introducing an innovative interleaved optimization framework to alternately promote both components. The modality transfer gradually narrows the modality gaps, facilitating the self-supervised homography estimation to fully leverage the synthetic intra-modal data. The self-supervised homography estimation progressively achieves reliable predictions, thereby providing robust cross-modal supervision for the modality transfer. To further boost the estimation accuracy, we also formulate a fine-grained homography feature loss to improve the connection between two components. Furthermore, we employ a simple yet effective distillation training technique to reduce model parameters and improve cross-domain generalization ability while maintaining comparable performance. Experiments reveal that InterNet achieves the state-of-the-art (SOTA) performance among unsupervised methods, and even outperforms many supervised methods such as MHN and LocalTrans.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is homography estimation between cross - modal images. Specifically, the author proposes a new unsupervised framework, InterNet, for performing homography estimation between images of different modalities. This problem is very important in practical applications, such as in computer vision tasks like robot localization without GPS signals, multi - modal image inpainting, and multi - spectral image fusion. ### Problem Background Traditional homography estimation methods usually rely on labeled data for supervised learning. However, in practical applications, since multi - modal images are obtained through different imaging sensors, the real homography deformation is usually unknown, so it is difficult to obtain sufficient labeled data. To solve this problem, existing unsupervised methods mainly achieve homography estimation by optimizing the similarity between the warped source image and the target image, but these methods have poor performance when dealing with large deformations and modal gaps. ### Core Contributions of the Paper 1. **Proposing a new unsupervised cross - modal homography estimation framework, InterNet**: - InterNet combines modality transfer and self - supervised homography prediction, and gradually narrows the modal gap and improves the accuracy of cross - modal homography estimation by alternately optimizing these two modules. 2. **Introducing an interleaved optimization framework**: - Inspired by the alternating direction multiplier method (ADMM) and the split Bregman method, InterNet adopts an interleaved optimization strategy, which decomposes complex optimization problems into more tractable sub - problems to ensure better convergence performance. 3. **Fine - grained Homography Feature Loss (FGHomo Loss)**: - To further enhance the mutual promotion between the two modules, the author proposes a fine - grained homography feature loss to constrain the feature consistency in the homography estimation module. 4. **Distillation Training Technique**: - By introducing a simple distillation training technique, the number of model parameters is significantly reduced, the cross - domain generalization ability is improved, and comparable performance is maintained. ### Experimental Results Experiments show that InterNet has achieved the state - of - the - art performance of unsupervised methods on multiple datasets, and in some cases even outperforms supervised methods. For example, on the GoogleMap and WHU - OPT - SAR datasets, the mean angular error (MACE) of InterNet is 54.3% and 47.4% lower than that of MHN respectively, and 61.8% and 85.8% lower than that of LocalTrans respectively. ### Summary The main contribution of this paper lies in proposing an innovative unsupervised cross - modal homography estimation framework, InterNet. By interleavedly optimizing modality transfer and self - supervised homography prediction, it solves the homography estimation problem under large modal gaps and large deformations, and shows excellent performance on multiple benchmark datasets.

InterNet: Unsupervised Cross-modal Homography Estimation Based on Interleaved Modality Transfer and Self-supervised Homography Prediction

MCNet: Rethinking the Core Ingredients for Accurate and Efficient Homography Estimation

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

Learning Disentangled Representation for Cross-Modal Retrieval with Deep Mutual Information Estimation.

SCPNet: Unsupervised Cross-modal Homography Estimation via Intra-modal Self-supervised Learning

Multimodal Image-to-Image Translation via Mutual Information Estimation and Maximization

Recurrent Homography Estimation Using Homography-Guided Image Warping and Focus Transformer

Self-Supervised Deep Homography Estimation with Invertibility Constraints

CrossHomo: Cross-Modality and Cross-Resolution Homography Estimation

Learning Inter- and Intra-frame Representations for Non-Lambertian Photometric Stereo

Unsupervised deep homography with multi-scale global attention.

Coarse-to-Fine Homography Estimation for Infrared and Visible Images

Deep Unsupervised Homography Estimation for Single-Resolution Infrared and Visible Images Using GNN

Homography Decomposition Networks for Planar Object Tracking

Unsupervised Homography Estimation on Multimodal Image Pair via Alternating Optimization

Content-Aware Unsupervised Deep Homography Estimation and its Extensions

Iterative Deep Homography Estimation

Unsupervised Deep Homography: A Fast and Robust Homography Estimation Model

Deep Learning based Inter-Modality Image Registration Supervised by Intra-Modality Similarity

Mind the Gap: Learning Modality-Agnostic Representations With a Cross-Modality UNet

Deep Homography Estimation with Pairwise Invertibility Constraint.