Scalable Scene Modeling from Perspective Imaging: Physics-based Appearance and Geometry Inference

Shuang Song

2024-04-02

Abstract:3D scene modeling techniques serve as the bedrocks in the geospatial engineering and computer science, which drives many applications ranging from automated driving, terrain mapping, navigation, virtual, augmented, mixed, and extended reality (for gaming and movie industry etc.). This dissertation presents a fraction of contributions that advances 3D scene modeling to its state of the art, in the aspects of both appearance and geometry modeling. In contrast to the prevailing deep learning methods, as a core contribution, this thesis aims to develop algorithms that follow first principles, where sophisticated physic-based models are introduced alongside with simpler learning and inference tasks. The outcomes of these algorithms yield processes that can consume much larger volume of data for highly accurate reconstructing 3D scenes at a scale without losing methodological generality, which are not possible by contemporary complex-model based deep learning methods. Specifically, the dissertation introduces three novel methodologies that address the challenges of inferring appearance and geometry through physics-based modeling.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

This paper aims to address two main problems in 3D scene modeling: appearance and geometry modeling. Unlike mainstream deep learning methods, the core contribution of the paper lies in the development of algorithm based on first principles, combining complex physical models with simple learning and reasoning tasks. This approach can handle large amounts of data, accurately reconstruct large-scale 3D scenes while maintaining generality, which current deep learning methods based on complex models cannot achieve. Firstly, the paper solves the problem of efficiently reconstructing meshes from unordered point clouds, especially for large and complex scenes. The proposed solution combines learned visibility of virtual views and graph-cut based mesh generation framework, utilizing depth to predict visibility in virtual views and adopting adaptive visibility weighting based on graph-cut, achieving robust mesh reconstruction. Secondly, the paper explores the challenge of merging multiple 3D mesh models, especially those obtained through oblique photogrammetry, into a unified high-resolution scene model. By using panoramic virtual camera field and truncated signed distance field, the paper proposes a new method that seamlessly handles 3D mesh fusion, particularly suitable for standard geoscientific applications with complex topology and polyhedral geometry. Lastly, the paper presents a physics-based approach to recover albedo from aerial photogrammetric images. This method accurately recovers albedo information by utilizing an advanced inverse rendering framework, combined with specific information from the photogrammetric dataset (such as known sun position and estimable scene geometry). These methods are demonstrated to be effective and scalable through rigorous experiments and comparisons with the state-of-the-art methods, laying a solid foundation for future exploration and practical applications in the rapidly developing field of 3D scene reconstruction.

Scalable Scene Modeling from Perspective Imaging: Physics-based Appearance and Geometry Inference

Modeling Complex Motion: Photometric, Geometric, Dynamic, and Topological Aspects

3D-IntPhys: Towards More Generalized 3D-grounded Visual Intuitive Physics under Challenging Scenes

Incorporating dense metric depth into neural 3D representations for view synthesis and relighting

Towards Scene Understanding with Detailed 3D Object Representations

Subsurface Boundary Geometry Modeling: Applying Computational Physics, Computer Vision and Signal Processing Techniques to Geoscience

Advancing Applications of Satellite Photogrammetry: Novel Approaches for Built-up Area Modeling and Natural Environment Monitoring using Stereo/Multi-view Satellite Image-derived 3D Data

Single-Image 3D Scene Parsing Using Geometric Commonsense

Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics

3D scene analysis and modeling both for static and dynamic environments

Robust Geometry-Preserving Depth Estimation Using Differentiable Rendering

Modeling, Stylization, and Rendering of Three-Dimensional Scanned Outdoor Environments

A Divide-and-conquer Approach to Large Scene Reconstruction with Interactive Scene Analysis and Segmentation

A Machine Learning Approach to Recovery of Scene Geometry from Images

A geometry-informed deep learning framework for ultra-sparse 3D tomographic image reconstruction

Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering

Three-Dimensional Structure Measurement And Optimization Method Of Indoor Scene Based On Single Image

Incremental Joint Learning of Depth, Pose and Implicit Scene Representation on Monocular Camera in Large-scale Scenes

Computing three-dimensional scene from a single image by bottom-up/top-down bayesian inference

SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding

A transition towards virtual representations of visual scenes