Abstract:Accurate multispectral image matching presents significant challenges due to non-linear intensity variations across spectral modalities, extreme viewpoint changes, and the scarcity of labeled datasets. Current state-of-the-art methods are typically specialized for a single spectral difference, such as visibleinfrared, and struggle to adapt to other modalities due to their reliance on expensive supervision, such as depth maps or camera poses. To address the need for rapid adaptation across modalities, we introduce XPoint, a self-supervised, modular image-matching framework designed for adaptive training and fine-tuning on aligned multispectral datasets, allowing users to customize key components based on their specific tasks. XPoint employs modularity and self-supervision to allow for the adjustment of elements such as the base detector, which generates pseudoground truth keypoints invariant to viewpoint and spectrum variations. The framework integrates a VMamba encoder, pretrained on segmentation tasks, for robust feature extraction, and includes three joint decoder heads: two are dedicated to interest point and descriptor extraction; and a task-specific homography regression head imposes geometric constraints for superior performance in tasks like image registration. This flexible architecture enables quick adaptation to a wide range of modalities, demonstrated by training on Optical-Thermal data and fine-tuning on settings such as visual-near infrared, visual-infrared, visual-longwave infrared, and visual-synthetic aperture radar. Experimental results show that XPoint consistently outperforms or matches state-ofthe-art methods in feature matching and image registration tasks across five distinct multispectral datasets. Our source code is available at <a class="link-external link-https" href="https://github.com/canyagmur/XPoint" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenges in multispectral image matching, especially due to the nonlinear intensity changes between different spectral patterns, extreme view - angle changes, and the scarcity of labeled datasets. The existing state - of - the - art methods usually focus on a single spectral difference (such as visible - infrared), and have difficulties in adapting to other modalities because they rely on expensive supervised information, such as depth maps or camera poses. Specifically, the paper aims to solve the following problems: 1. **Cross - modality Adaptability**: Existing methods perform poorly when dealing with different spectral patterns and are difficult to generalize to unseen modalities. 2. **Scarcity of Labeled Data**: Multispectral image matching requires a large amount of labeled data, but these data are often difficult to obtain. 3. **View - angle and Spectral Changes**: Multispectral images change significantly under different view - angles and spectra, which increases the difficulty of matching. To solve these problems, the paper proposes XPoint, which is a self - supervised, modular image - matching framework that can perform adaptive training and fine - tuning on aligned multispectral datasets, allowing users to customize key components according to specific tasks. The main contributions of XPoint include: - **Multispectral Homeomorphic Transformation**: An improved multispectral homeomorphic transformation method is introduced to generate a set of pseudo - real key points that are invariant to view - angle and spectral changes. - **Pre - trained VMamba Encoder**: The VMamba encoder pre - trained on the segmentation task is used to enhance the feature extraction ability. - **Geometrically Constrained Regression Head**: A special task - specific head for homeomorphic regression is introduced to impose geometric constraints to improve the matching performance. - **Improved Detector Loss**: For datasets with significant spectral differences (such as VIS - SAR and VIS - NIR), the weighted cross - entropy loss is adopted to improve the performance of the model under complex conditions. Through these improvements, XPoint can achieve high - precision image matching and registration on multiple multispectral datasets, demonstrating its superior performance in multi - modal image - matching tasks.

XPoint: A Self-Supervised Visual-State-Space based Architecture for Multispectral Image Registration

Heterogeneous self-supervised interest point matching for multi-modal remote sensing image registration

Improved Robust Kernel Subspace for Object-Based Registration and Change Detection

An Automatic Registration Approach to Laser Point Sets Based on Multidiscriminant Parameter Extraction

Multispectral Snapshot Image Registration Using Learned Cross Spectral Disparity Estimation and a Deep Guided Occlusion Reconstruction Network

Differentiable Registration of Images and LiDAR Point Clouds with VoxelPoint-to-Pixel Matching

Self-Supervised Keypoint Detection and Cross-Fusion Matching Networks for Multimodal Remote Sensing Image Registration

Multifeature Alignment and Matching Network for SAR and Optical Image Registration

A Fast and Fully Automatic Registration Approach Based on Point Features for Multi-Source Remote-Sensing Images

A robust and accurate feature matching method for multi-modal geographic images spatial registration

A Semi-Supervised Image Registration Framework Based on Multimodal Cross-Attention

A Global-to-Local Algorithm for High-Resolution Optical and SAR Image Registration.

MM-Point: Multi-View Information-Enhanced Multi-Modal Self-Supervised 3D Point Cloud Understanding

Fast and Robust Matching for Multimodal Remote Sensing Image Registration

PointMBF: A Multi-scale Bidirectional Fusion Network for Unsupervised RGB-D Point Cloud Registration

Multiview 2D/3D Rigid Registration via a Point-Of-Interest Network for Tracking and Triangulation

Automatic Optical-to-SAR Image Registration by Iterative Line Extraction and Voronoi Integrated Spectral Point Matching

Geometric- and Optimization-Based Registration Methods for Long-Wave Infrared Hyperspectral Images

A Strong Baseline for Point Cloud Registration via Direct Superpoints Matching

XTrack: Multimodal Training Boosts RGB-X Video Object Trackers

Multimodal Remote Sensing Image Registration Based on Adaptive Spectrum Congruency