Abstract:It is common experience for human vision to perceive full 3D shape and scene from a single 2D image with the occluded parts "filled-in" by prior visual knowledge. Thus, computing the 3D structures of all the objects in the scene from a single image is a fundamental problem in computer vision. In this thesis, we propose a bottom-up/top-down Bayesian inference framework to compute the 3D structures of objects in the scene from a single image, which integrates the involved visual tasks (segmentation, perceptual grouping, object detection and recognition, 3D reconstruction) in a principled way and incorporates the prior visual knowledge in the inference. The output of the inference framework is a hierarchical "parsing graph" with the scene label at the top (or root), objects with 3D structures and their parts at intermediate nodes, and image pixels at the bottom. The number of layers in this parsing graph is determined by the types of objects or visual patterns. The nodes in this parsing graph correspond to visual patterns represented by probabilistic models. The parsing graph also has both top-down connections and horizontal spatial connections, which correspond to the generative models and spatial relations modeled by Markov Random Field (MRF) respectively. Formulated in Bayesian framework, the inference algorithm computes the parsing graph from the input image by optimizing a posterior probability. In this optimization process, we integrate two popular computing paradigms in computer vision: generative methods, and discriminative methods. The former formulates the posterior probability to maximize in terms of generative models for images defined by likelihood functions and priors. The latter computes discriminative proposals using some bottom-up tests to drive the maximizing process in the solution space. Thus, the inference algorithm achieves both speed and consistency. We also investigate three mechanisms to efficiently construct the parsing graph based on the properties of visual patterns being computed: bottom-up construction mechanism, top-down construction mechanism, and bottom-up/top-down construction mechanism.

Bayesian Reconstruction of 3d Shapes and Scenes from A Single Image

Computing three-dimensional scene from a single image by bottom-up/top-down bayesian inference

Model-driven sketch reconstruction with structure-oriented retrieval

Bayesian mesh reconstruction from noisy point data

Enhanced 3D Shape Reconstruction With Knowledge Graph of Category Concept

Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning

3D Reconstruction from a Single Still Image Based on Monocular Vision of an Uncalibrated Camera

A Metric Learning Method for Image-based 3D Shape Retrieval

Bayes3D: fast learning and inference in structured generative models of 3D objects and scenes

Single Image Based Three-Dimensional Scene Reconstruction Using Semantic and Geometric Priors

Visual Odometry Based 3D-Reconstruction

Robust Bayesian Scene Reconstruction by Leveraging Retrieval-Augmented Priors

Single-Image 3D Scene Parsing Using Geometric Commonsense

Hypothesize and Bound: A Computational Focus of Attention Mechanism for Simultaneous 3D Shape Reconstruction, Pose Estimation and Classification from a Single 2D Image

A Stochastic Algorithm for 3D Scene Segmentation and Reconstruction

Automatic Single View Building Reconstruction by Integrating Segmentation.

Neural Implicit 3D Shapes from Single Images with Spatial Patterns.

Single Image 3D Object Estimation with Primitive Graph Networks

Neural 3D Scene Reconstruction from Multiple 2D Images without 3D Supervision

Enhanced Depth Estimation and 3D Geometry Reconstruction using Bayesian Helmholtz Stereopsis with Belief Propagation

Incremental 3D reconstruction using Bayesian learning