Abstract:This article tackles the visual localization of unmanned aerial vehicles (UAVs) in the presence of multisource and cross-view images are involved. We present a lightweight end-to-end scene graph encoding and matching network that finds the best matches for the airborne camera views from the reference image maps. The scene graph addresses the challenges of encoding the semantic scene by aggregating the image convolutional features into global and structured semiglobal descriptors. The principal contributions of this article are as follows: First, we develop a new network architecture that embeds a nonlocal block and a modified vector of locally aggregated descriptors network (NetVLAD) into a backbone convolutional neural network. The main component of the modified NetVLAD is a cluster similarity masking graph (CSMG) encoder, which is proposed to replace the feature-cluster residuals computing in NetVLAD with cluster consensus feature aggregation and structure-aware scene graph extraction. In addition, a global descriptor is extracted by a nonlocal block to label each image with a discriminative global feature descriptor. Second, we develop a new triplet loss for the network training procedure to learn the features at different semantic levels. The proposed global descriptor and CSMG encoder are trained together according to a weighted sum of cosine triplet losses. Third, the global descriptor from the nonlocal block and semiglobal descriptor from the CSMG encoder work hierarchically for coarse-to-fine image retrieval and can achieve real-time efficiency and favorable accuracy of image searching and matching from the reference image map. We train and test the model on two challenging benchmark datasets. We also test the pretrained model on a dataset collected by a fixed-wing UAV to further evaluate the model's generalizability. The benchmark evaluations and ablation experiments show that the developed method outperforms state-of-the-art methods and achieves superior performance in the real-time matching of UAV images and reference image maps for UAV visual localization. Open-source code is available on GitHub.

VisIRNet: Deep Image Alignment for UAV-Taken Visible and Infrared Image Pairs

VisIRNet: Deep Image Alignment for UAV-taken Visible and Infrared Image Pairs

Infrared and Visible Image Registration in UAV Inspection

Multiview Image Matching of Optical Satellite and UAV Based on a Joint Description Neural Network

A multi-level image alignment method for aerial image and road-based geo-parcel data

VL-MFL: UAV Visual Localization Based on Multisource Image Feature Learning

Alleviating Spatial Misalignment and Motion Interference for UAV-based Video Recognition

Efficient Fourier Filtering Network with Contrastive Learning for UAV-based Unaligned Bi-modal Salient Object Detection

A Robust Infrared and Visible Image Registration Method for Dual Sensor UAV System

Real-Time Cross-View Image Matching and Camera Pose Determination for Unmanned Aerial Vehicles

LLFE: A Novel Learning Local Features Extraction for UAV Navigation Based on Infrared Aerial Image and Satellite Reference Image Matching

Jointly modeling association and motion cues for robust infrared UAV tracking

A Scene Graph Encoding and Matching Network for UAV Visual Localization

Cross-view UAV image matching and localization using deep convolution features

Leveraging Map Retrieval and Alignment for Robust UAV Visual Geo-Localization

Infrared and Visible Image Fusion with Deep Neural Network in Enhanced Flight Vision System

Translation, Scale and Rotation: Cross-Modal Alignment Meets RGB-Infrared Vehicle Detection

Sequence Matching for Image-Based UAV-to-Satellite Geolocalization

Leveraging edge detection and neural networks for better UAV localization

General cross-modality registration framework for visible and infrared UAV target image registration

Deep Neural Network Architecture Search for Accurate Visual Pose Estimation aboard Nano-UAVs