Scalable Scene Modeling from Perspective Imaging: Physics-based Appearance and Geometry Inference

Shuang Song
2024-04-02
Abstract:3D scene modeling techniques serve as the bedrocks in the geospatial engineering and computer science, which drives many applications ranging from automated driving, terrain mapping, navigation, virtual, augmented, mixed, and extended reality (for gaming and movie industry etc.). This dissertation presents a fraction of contributions that advances 3D scene modeling to its state of the art, in the aspects of both appearance and geometry modeling. In contrast to the prevailing deep learning methods, as a core contribution, this thesis aims to develop algorithms that follow first principles, where sophisticated physic-based models are introduced alongside with simpler learning and inference tasks. The outcomes of these algorithms yield processes that can consume much larger volume of data for highly accurate reconstructing 3D scenes at a scale without losing methodological generality, which are not possible by contemporary complex-model based deep learning methods. Specifically, the dissertation introduces three novel methodologies that address the challenges of inferring appearance and geometry through physics-based modeling.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper aims to address two main problems in 3D scene modeling: appearance and geometry modeling. Unlike mainstream deep learning methods, the core contribution of the paper lies in the development of algorithm based on first principles, combining complex physical models with simple learning and reasoning tasks. This approach can handle large amounts of data, accurately reconstruct large-scale 3D scenes while maintaining generality, which current deep learning methods based on complex models cannot achieve. Firstly, the paper solves the problem of efficiently reconstructing meshes from unordered point clouds, especially for large and complex scenes. The proposed solution combines learned visibility of virtual views and graph-cut based mesh generation framework, utilizing depth to predict visibility in virtual views and adopting adaptive visibility weighting based on graph-cut, achieving robust mesh reconstruction. Secondly, the paper explores the challenge of merging multiple 3D mesh models, especially those obtained through oblique photogrammetry, into a unified high-resolution scene model. By using panoramic virtual camera field and truncated signed distance field, the paper proposes a new method that seamlessly handles 3D mesh fusion, particularly suitable for standard geoscientific applications with complex topology and polyhedral geometry. Lastly, the paper presents a physics-based approach to recover albedo from aerial photogrammetric images. This method accurately recovers albedo information by utilizing an advanced inverse rendering framework, combined with specific information from the photogrammetric dataset (such as known sun position and estimable scene geometry). These methods are demonstrated to be effective and scalable through rigorous experiments and comparisons with the state-of-the-art methods, laying a solid foundation for future exploration and practical applications in the rapidly developing field of 3D scene reconstruction.