Junhuan LiuYichen MaSan JiangLizhe WangQingquan LiWanshou Jianga School of Computer Science,China University of Geosciences,Wuhan,Chinab Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ),Guangdong Shenzhen,Chinac Hubei Key Laboratory of Intelligent Geo-Information Processing,China University of Geosciences,Wuhan,Chinad State Key Laboratory of Information Engineering in Surveying,Mapping and Remote Sensing,Wuhan University,Wuhan,China

Abstract:Structure from Motion (SfM) is a 3D reconstruction framework that has achieved great success on large-scale Unmanned Aerial Vehicle (UAV) images. Due to the high time consumption of feature matching, overlapped match pairs are obtained by image retrieval to improve efficiency. Bag of words (BoW) is commonly used in existing SfM systems. However, the large number of local features and the high dimension of BoW vectors cause image retrieval time-consuming. Recently, the lower dimension learned global features and more efficient feature aggregation methods provide solutions to the problem. Besides, efficient approximate nearest neighbour (ANN) searching can further accelerate image retrieval. Thus, this study conducts an evaluation of image retrieval methods for UAV images in SfM-based reconstruction. First, image retrieval methods with varying combinations of feature descriptors, aggregation strategies, and NN searching algorithms are reviewed and configured for performance evaluation. Second, the selected methods are evaluated in SfM-based reconstruction, in which the image retrieval results are fed into the workflow to guide feature matching and then exploited to create the weighted view graph to achieve parallel SfM reconstruction. Finally, comprehensive tests are conducted to evaluate the performance of selected methods by using three large-scale UAV datasets. The experimental results show that: (1) for feature aggregation and NN searching, Vector of Locally Aggregated Descriptors (VLAD) has superior performance compared with other strategies, and Hierarchical Navigable Small World (HNSW) has better achievement in NNS; (2) among evaluated feature descriptors, with the combination of VLAD and HNSW, the retrieval accuracy of SIFT is still higher than that of the learned local and global features. In a word, the optimal image retrieval method consists of SIFT, VLAD and HNSW, whose retrieval accuracy is higher than BoW by about 2% and efficiency is around 100 times that of BoW.

DCF-BoW: Build Match Graph Using Bag of Deep Convolutional Features for Structure from Motion.

Efficient Covisibility-based Image Matching for Large-Scale SfM

Efficient Non-Consecutive Feature Tracking For Structure-From-Motion

Graph-Based Consistent Matching for Structure-from-Motion

TC-SfM: Robust Track-Community-Based Structure-from-Motion

Efficient Non-Consecutive Feature Tracking for Robust Structure-From-Motion

Double constrained bag of words for human action recognition

Geometry-aware Feature Matching for Large-Scale Structure from Motion

Efficient Match Pair Retrieval for Large-scale UAV Images via Graph Indexed Global Descriptor

Matchable image retrieval for large-scale UAV images: an evaluation of SfM-based reconstruction

RTSfM: Real-Time Structure From Motion for Mosaicing and DSM Mapping of Sequential Aerial Images With Low Overlap

Visual Geometry Grounded Deep Structure From Motion

Detector-Free Structure from Motion

EC-SfM: Efficient Covisibility-based Structure-from-Motion for Both Sequential and Unordered Images

Associating UAS images through a graph‐based guiding strategy for boosting structure from motion

SfM on-the-fly: Get better 3D from What You Capture

Leveraging vocabulary tree for simultaneous match pair selection and guided feature matching of UAV images

Generalized Correspondence Matching via Flexible Hierarchical Refinement and Patch Descriptor Distillation

Structure-from-Motion using Dense CNN Features with Keypoint Relocalization

CM-BOF: visual similarity-based 3D shape retrieval using Clock Matching and Bag-of-Features

SWCF-Net: Similarity-weighted Convolution and Local-global Fusion for Efficient Large-scale Point Cloud Semantic Segmentation