Abstract:Using unmanned aerial vehicles (UAVs) for remote sensing has the advantages of high flexibility, convenient operation, low cost, and wide application range. It fills the need for rapid acquisition of high-resolution aerial images in modern photogrammetry applications. Due to the insufficient parallaxes and the computation-intensive process, dense real-time reconstruction for large terrain scenes is a considerable challenge. To address these problems, we proposed a novel SLAM-based MVS (Multi-View-Stereo) approach, which can incrementally generate a dense 3D (three-dimensional) model of the terrain by using the continuous image stream during the flight. The pipeline of the proposed methodology starts with pose estimation based on SLAM algorithm. The tracked frames were then selected by a novel scene-adaptive keyframe selection method to construct a sliding window frame-set. This was followed by depth estimation using a flexible search domain approach, which can improve accuracy without increasing the iterate time or memory consumption. The whole system proposed in this study was implemented on the embedded GPU based on an UAV platform. We proposed a highly parallel and memory-efficient CUDA-based depth computing architecture, enabling the system to achieve good real-time performance. The evaluation experiments were carried out in both simulation and real-world environments. A virtual large terrain scene was built using the Gazebo simulator. The simulated UAV equipped with an RGB-D camera was used to obtain synthetic evaluation datasets, which were divided by flight altitudes (800-, 1000-, 1200 m) and terrain height difference (100-, 200-, 300 m). In addition, the system has been extensively tested on various types of real scenes. Comparison with commercial 3D reconstruction software is carried out to evaluate the precision in real-world data. According to the results on the synthetic datasets, over 93.462% of the estimation with absolute error distance of less then 0.9%. In the real-world dataset captured at 800 m flight height, more than 81.27% of our estimated point cloud are less then 5 m difference with the results of Photoscan. All evaluation experiments show that the proposed approach outperforms the state-of-the-art ones in terms of accuracy and efficiency.

TerrainMesh: Metric-Semantic Terrain Reconstruction From Aerial Images Using Joint 2-D-3-D Learning

TerrainMesh: Metric-Semantic Terrain Reconstruction from Aerial Images Using Joint 2D-3D Learning

Mesh Reconstruction from Aerial Images for Outdoor Terrain Mapping Using Joint 2D-3D Learning

Mesh-LOAM: Real-time Mesh-Based LiDAR Odometry and Mapping

Sat-Mesh: Learning Neural Implicit Surfaces for Multi-View Satellite Reconstruction

Semantic Reconstruction based on RGB Image and Sparse Depth

Fully Automated Photogrammetric Data Segmentation and Object Information Extraction Approach for Creating Simulation Terrain

Semantic 3D Reconstruction with Learning MVS and 2D Segmentation of Aerial Images

Photometric multi-view mesh refinement for high-resolution satellite images

Onboard Real-Time Dense Reconstruction in Large Terrain Scene Using Embedded UAV Platform

3D Reconstruction of Remote Sensing Mountain Areas with TSDF-Based Neural Networks

Leveraging photogrammetric mesh models for aerial-ground feature point matching toward integrated 3D reconstruction

Reconstructing Occluded Elevation Information in Terrain Maps With Self-Supervised Learning

Real-Time Metric-Semantic Mapping for Autonomous Navigation in Outdoor Environments

Efficient 3D Reconstruction Using Monocular Vision

Large-Scale 3D Terrain Reconstruction Using 3D Gaussian Splatting for Visualization and Simulation

Classifying geospatial objects from multiview aerial imagery using semantic meshes

SEMANTIC URBAN MESH SEGMENTATION BASED ON AERIAL OBLIQUE IMAGES AND POINT CLOUDS USING DEEP LEARNING

MULTI-MODAL SEMANTIC MESH SEGMENTATION IN URBAN SCENES

Meshed Up: Learnt Error Correction in 3D Reconstructions

Vis2Mesh: Efficient Mesh Reconstruction from Unstructured Point Clouds of Large Scenes with Learned Virtual View Visibility